ExaScience / elprep

elPrep: a high-performance tool for analyzing sequence alignment/map files in sequencing pipelines.
Other
287 stars 40 forks source link

Support for multiple input files #47

Closed matthdsm closed 3 years ago

matthdsm commented 3 years ago

Hi,

I was wondering if it would be possible to add support for multiple input files. This way elprep could be used to merge multiple split bam files into a single final bam. This approach is also used by GATK.

Thanks M

geertvandeweyer commented 3 years ago

If you provide a folder with BAM files as input, "elprep split" takes them all as input. Not sure if other cmds can do it as well.

leonorpalmeira commented 3 years ago

Does anyone of you know if elprep sfm also allows for this?

caherzee commented 3 years ago

Both "elprep split" and "elprep sfm" can take a path to multiple input files as input, but elprep assumes that the headers of the input files are identical and will not perform any merging of headers. If the headers of the inputs are different, then you may want to first create a merged header with a different tool. Also see our README under "elprep split" "Description" for more details.

Proper merging of headers in is something we have on our to-do list.

leonorpalmeira commented 3 years ago

Thanks a lot for those details. In our case, it would be separate mappings of each sequenced lane from the same sample with the exact same Read Group (@RG) so most of the header would be the same. But indeed, the @PG lines which are specific to the bwa mapping invocation would contain only the information of one of the lanes.

Thanks!

matthdsm commented 3 years ago

Thanks for the info. I must've missed this in the README.