JaneliaSciComp / msg

Multiplexed Shotgun Genotyping
http://genomics.princeton.edu/AndolfattoLab/MSG.html
11 stars 12 forks source link

If extract_ref_alleles finds no references, it should die w/ useful error, con't pipeline w/ orig. genome #18

Open gregpinero opened 12 years ago

gregpinero commented 12 years ago

We should force extract-ref-alleles.py to die. Then, the pipeline needs to reset some parameters that come from msg.cfg. If there are no ref alleles for genome updating, then an updated genome cannot be produced. THe user should be alerted, but the msg pipeline can continue by treating the original ref genome as the relevant parental genome. I expect this will almost never be a problem (since updating data sets will usually contain >100,000 reads), but since you found the bug, we might as well squash it so it doesn't return to haunt us.

Misc Context:

On Nov 1, 2011, at 10:12 AM, Pinero, Gregory wrote:

I figured this out. I traced the issue back through the files and found that update_minQV must be 0 for the toy data. Otherwise it doesn't meet the quality threshold and extract-ref-alleles.py doesn't find any refs and never creates any ref files, which messes things up downstream.

Perhaps it's worth having extract-ref-alleles.py die with an appropriate error message if it doesn't find any refs? It seems like that would be easier for a user to figure out than a cryptic R message a few modules downstream.

Yes, I should have checked the msg.cfg file first :-( For some reason I really thought it was the same as my previous run.

-Greg

...

I now see this error in msgRun3's error output:

Error in nsites[match(contigs, nsites$fac), ]$Freq : $ operator is invalid for atomic vectors Calls: contig.info Execution halted

I believe it's refering to ~line 432 in hmmlib.R. Are either of you familiar with this particular code? Or is there anyone I can ask? Otherwise I'll try to dive in and figure out what's going on there. Perhaps R was upgraded on the grid?

-Greg