Fix train_half_life_linear_model.R

ewallace commented 3 years ago

I could not get train_half_life_linear_model.R to run in the current master branch. There were a lot of problems and this needs to be fixed soon.

The problems start at around line 119 (master):

single_count_median_3UTR_motifs_freq <- motif_count_function(motif_regex$regex, single_count_median_3UTR_threePrimeUTR$threePrimeUTR, gene_name = single_count_median_3UTR_motifs_freq$transcriptName)

colnames(single_count_median_3UTR_motifs_freq) <-  c("geneName", "transcriptSeque", motif_regex$newMotifIUPAC)

# combine motif,codon and chan decay datasets
single_count_decay_prediction_dataset_chan <- single_count_median_3UTR_motifs_freq %>%
  inner_join(chan_decay_hlife) %>%
  inner_join(codon_freq) %>%
  mutate(UTR3_length = str_length(threePrimeUTR))

There are a series of problems:

transcriptName and geneName used inconsistently; fix by picking one and using throughout
colnames call has inconsistent sizes on each side, I think? Needs some detailed debugging about what the names are.
mutate(UTR3_length = str_length(threePrimeUTR)) is applied to a tibble that does not have threePrimeUTR; probably fix by defining earlier in yeast_3UTRs and passing forwards to other tibbles that use it
inconsistent coding style, and organization where adjacent lines of code skip around different topics, means it's hard to read and fix.

ewallace commented 3 years ago

Another problem with this is that it takes several minutes to run the codon_count code earlier in the script, which creates a major incentive to not debug it.

Best to fix this by computing codon counts in a different script, count_codons.R, and just reading in the table to train_half_life_linear_model.R.

ewallace commented 3 years ago

I have attempted to address in commit bcb319b, branch train_halflife_problems. Let's discuss.

DimmestP commented 3 years ago

Sam will check the train_halflife_problems branch and add if there are no bugs

DimmestP / chimera_project_manuscript

Fix train_half_life_linear_model.R #173