DimmestP / chimera_project_manuscript

1 stars 2 forks source link

Fix train_half_life_linear_model.R #173

Closed ewallace closed 3 years ago

ewallace commented 3 years ago

I could not get train_half_life_linear_model.R to run in the current master branch. There were a lot of problems and this needs to be fixed soon.

The problems start at around line 119 (master):

single_count_median_3UTR_motifs_freq <- motif_count_function(motif_regex$regex, single_count_median_3UTR_threePrimeUTR$threePrimeUTR, gene_name = single_count_median_3UTR_motifs_freq$transcriptName)

colnames(single_count_median_3UTR_motifs_freq) <-  c("geneName", "transcriptSeque", motif_regex$newMotifIUPAC)

# combine motif,codon and chan decay datasets
single_count_decay_prediction_dataset_chan <- single_count_median_3UTR_motifs_freq %>%
  inner_join(chan_decay_hlife) %>%
  inner_join(codon_freq) %>%
  mutate(UTR3_length = str_length(threePrimeUTR))

There are a series of problems:

ewallace commented 3 years ago

Another problem with this is that it takes several minutes to run the codon_count code earlier in the script, which creates a major incentive to not debug it.

Best to fix this by computing codon counts in a different script, count_codons.R, and just reading in the table to train_half_life_linear_model.R.

ewallace commented 3 years ago

I have attempted to address in commit bcb319b, branch train_halflife_problems. Let's discuss.

DimmestP commented 3 years ago

Sam will check the train_halflife_problems branch and add if there are no bugs