MrOlm / inStrain

Bioinformatics program inStrain
MIT License
134 stars 33 forks source link

Missing columns in "genome_info.tsv" output #158

Closed etd530 closed 10 months ago

etd530 commented 11 months ago

Dear Matt,

When running inStrain for several BAM files mapped against the same set of genomes, I noticed that some columns in the genome_info.tsv output files were present in some cases, but not in others. Specifically, the columns are linked_SNV_count, SNV_distance_mean, r2_mean and d_prime_mean. Could this be a bug, or am I missing something?

The exact command was:

inStrain profile $file ../../wolbachia.genomes.dereplicated_ani99_maf90_concat.fna --stb ../../contigNames_all_dereplicated_ani99_maf90.tsv -o ${prefix}.drep_ani99_maf90.instrain -p 15 --database_mode

Where $file corresponds to a different BAM file each time.

Many thanks,

Eric

MrOlm commented 11 months ago

Hi Eric,

This happens when there are no SNVs on that genome that can be linked by paired reads. You can verify this because you should see no scaffolds from those genomes in the linkage.tsv file (https://instrain.readthedocs.io/en/latest/example_output.html#linkage-tsv)

Apologies this isn't very well documented.

-Matt

etd530 commented 11 months ago

Hi Matt, Thanks for explaining, now it makes sense :) If I may, I would suggest that in future versions the program always outputs all columns even if one is completely empty, which would make it easier to compare output for multiple files. Thanks again! Eric

MrOlm commented 10 months ago

Fixed in v1.8.0