printing genome1 and genome2 strain name while sorting

FelixKrueger / SNPsplit

Allele-specific alignment sorting

http://felixkrueger.github.io/SNPsplit/

GNU General Public License v3.0

52 stars 20 forks source link

printing genome1 and genome2 strain name while sorting #14

Closed vivekbhr closed 6 years ago

vivekbhr commented 7 years ago

Hi @FelixKrueger

During SNPsplit sort, the strain names of the bam files are not entered (the suffix says _genome1 and _genome2). neither is it printed in the SNPsplit sort log files. The only way to find out what genome1 and 2 correspond to is to look back at the log of genome_generate.

It would be good to either replace the suffix _genome1/2 with strain ID or (probably even easier) to print the strain name for genome1 and 2 in the log file of SNPsplit sort

Thanks.. Best, Vivek

FelixKrueger commented 7 years ago

Hi Vivek,

I have been thinking about this point as well, but at the time SNPsplit is run there is actually no reference to which strain is genome 1 or genome 2 anywhere in the BAM file. In fact, the genomes may even have been constructed by other means than SNPsplit_genome_preparation, so there may not even be a such a thing as strain names. The output of SNPsplit consistency calls the reference (in the SNP file) genome 1, and the alternative base genome 2, so in a way we left it to the user to keep track of which genome they used as reference and which is the alternative.

I guess we could offer to add an option to input something like --genome1_name and --genome2_name when SNPsplit gets run so that this will then be used instead of simply calling it genome 1/genome 2?

vivekbhr commented 7 years ago

True.. if genome were prepared by other means then genome 1 and 2 name would have to be externally provided.. Although then the users can themselves substitute the names after sorting as well, so I have no strong preference on this.