DivyaratanPopli / Kinship_Inference

This is a tool to estimate pairwise relatedness from ancient DNA, taking in account contamination, ROH, ascertainment bias.
GNU General Public License v3.0
7 stars 2 forks source link

please change the code that it does not always split the sorted/indexed BAM files by chromosomes (index is just for this purpose that we can lookup by position in the original) #24

Open zmaroti opened 2 weeks ago

zmaroti commented 2 weeks ago

1) please add the '-h' option to KINgaroo (you have to read the code to see which options are truly available, some biorxiv manuscript refered the '-s 0' option as '--s 0'. Actually both option run, so the wrong '--s 0' was not giving any error, but no clues whether this inde/sorted the BAM files or not, since splitbams were created anyway (see below).

2) could you please implement the '-s' option in a way that when you have the BAM file indexed, sorted including all data (from chr 1-22) then it does not make a splitbam directory, and duplicate the data into it by chromosome? The indexed BA, files by chromosome can be used, you only have to add the '-r chr' option in samtools, or if it uses pysam you can also provide the chromosome number to use the original data. Spliting the bam is just creating unnecessary work, and duplicate your data. Makes you think twice before extending the reference size, as the method does not scale well in this scenario.

3) not curcial, but it would be useful if you could use a '-bamlist' option as well besides providing a single directory (with the -bam option) containing your BAM files. It is often that your BAM files are in several sequence runs. The function can be achieved by using soft links and create a link to your BAM files in a subdirctory, however it would be nice to have an option for this.