benedictpaten / marginPhase

MIT License
34 stars 16 forks source link

general enhancement #11

Open aechchiki opened 5 years ago

aechchiki commented 5 years ago

Not really an issue, but a documentation possible enhancement, whenever you have time :)

1) maybe mention that the params file can be found in the params folder. Also, probably you're working on that, but maybe it would be good to know a bit more on how these params were generated (what's the reasoning behind)? And what's the difference between the gap file and the other (for PacBio & Nanopore)?

2) I got an error while feeding a zipped reference fasta. Maybe worth mentioning that in the docs? The previous bam step already took a while for me.

> Parsing prior probabilities on positions from reference sequences: /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa.gz.  Maybe you should run 'samtools faidx /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa.gz'

3) and yes, as it was discussed in another issue, explicit that the reference file should be the haploid version of the genome.

4) is it feasible (or are you planning for next release) to multi-thread the operations? from what I see, is that it is memory intensive (about 4x the size of the genome I am feeding in, ~500Mbp genome -> ~2Gb RAM) but not (yet) parallelizable

I'll add here more points in case I get more ideas. Keep up the good work!

Best, Amina