bcgsc / NanoSim

Nanopore sequence read simulator
Other
217 stars 51 forks source link

Bug in besthit_to_histogram.py #129

Closed austin-abbvie closed 2 years ago

austin-abbvie commented 2 years ago

Hi! Through working through a bug that utilizes NanoSim (thanks for making this tool!) I found there was an error in the following line.

https://github.com/bcgsc/NanoSim/blob/fedc3a395a49527df17c345efd6f3daec97858d1/src/besthit_to_histogram.py#L320

This syntax was updated at some point and I believe the correct line should read

cs_string = get_cs(alnm.cigarstring, alnm.get_tag('MD'))

Thank you!

cheny19 commented 2 years ago

Hi @austin-abbvie,

What specific error did you encounter when running the program?

We switched to the above code in this commit. @SaberHQ , do you recall why was that?

Thanks, Chen

austin-abbvie commented 2 years ago

This was in the ScNapBar pipeline (https://github.com/austin-abbvie/single-cell-nanopore). It was a generic error when calling the function saying that the alignment object did not have the attribute original_sam_line.

SaberHQ commented 2 years ago

Since it is been a while, honestly I don't remember what was the exact reason I changed it from "alnm.cigar" to "alnm.original_sam_line.split()[5]". Perhaps due to some bugs not finding the cigar tag in sam alignments.

However, I can look into it and run some tests to see if everything works or not.

cheny19 commented 2 years ago

@austin-abbvie I tried to run your pipeline to reproduce the error, however I cannot clone the https://github.com/nanoporetech/single-cell-nanopore.git as it does not exist.

kmnip commented 2 years ago

I agree with @austin-abbvie that cigarstring should be used, but the function get_cs must handle the cases when cigarstring is None (i.e. when CIGAR is * in the SAM record).

@SaberHQ replaced cigar back then because he probably realised that it was deprecated in pysam and used the 6th column in the SAM record instead.

@cheny19 , You used the wrong URL, it should be: https://github.com/austin-abbvie/single-cell-nanopore.git

cheny19 commented 2 years ago

Ah, I see, I was simply copying the command from the README, I'll give it another shot tomorrow. @austin-abbvie you may need to change your README then.

austin-abbvie commented 2 years ago

@cheny19 I am working on the dev branch with a variety of changes that includes upgrading nanosim to V3.0.1. Make sure to clone the dev branch. This error should also be present in the source repository, which is also currently undergoing some changes on dev as well.

cheny19 commented 2 years ago

@austin-abbvie I have changed the code accordingly and made some tests to make sure it runs without error. To make the cigar string read function work, it is also important to make sure there's MD tag in the alignment which is not default in minimap2, just something to keep in mind.

austin-abbvie commented 2 years ago

Thank you! And yes, we run Minimap2 with the proper flags to generate an MD tag.