amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

sort issue #120

Closed jpofmars closed 3 years ago

jpofmars commented 5 years ago

Hello,

I get sort error with tools like GATK or samtools. It seems that the sort was done on the 8th column and GATK wants the sort was done on the 4th column I'm using SNAP version 1.0beta.23.

Here an example of error :

##### ERROR M00402:105:000000000-C9YJM:1:2113:13559:16965 81 3 3146516 23 2M1I9M1D1M1D91M 3 3146515 -105 TACAATAATGTTTATTTTTCTAACATATTTTTAAAAATAAACATTATTGAATTGAATAGAAGGTCCTTACTCTTTTCATCAGGAAGTAAGTCAGCTTGCAGTAT HFFHHHGGG3FHFHAFHEHHHHGGHHFHHHFHHHHHHGHHHFHFHHFHHHHHHHHHHGHHGFCFHGGDGHFCFHFFHHHFFHGGGGGGGGCGFBFFFFFBBBBA LB:Z:XXX PG:Z:SNAP RG:Z:XXX PL:Z:ILLUMINA NM:i:6 ##### ERROR M00402:105:000000000-C9YJM:1:2113:13559:16965 161 3 3146515 23 12M1D1M1D91M 3 3146516 105 TTCAATAATGTTTATTTTTCTAACATATTTTTAAAAATAAACATTATTGTATTGAATAGAAGGTCCTTACTCTTTTCATCAGGAAGTAAGTCAGCTTGCAGTAT BAAAAFBDFFDFGGGFGGGEGGHGFHHHHGFHHCGHCFHHHHHFH5DGF5BGGF55EGHFHFFHFHHHB3DGHGHFGHEHHHHHCHBFHHHF5BGHHHHEFBGH LB:Z:XXX PG:Z:SNAP RG:Z:XXX PL:Z:ILLUMINA NM:i:8

thank you

bolosky commented 5 years ago

It’s definitely not trying to sort by pnext (column 8) instead of pos (column 4). Did you look in the output of SNAP to be sure that they’re really out of order, instead of just relying on GATK? If so, then it’s probably a bug and I’ll follow up with you to see if I can repro it.

From: jpofmars notifications@github.com Sent: Monday, March 4, 2019 2:02 AM To: amplab/snap snap@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [amplab/snap] sort issue (#120)

Hello,

I get sort error with tools like GATK or samtools. It seems that the sort was done on the 8th column and GATK wants the sort was done on the 4th column I'm using SNAP version 1.0beta.23.

Here an example of error :

ERROR M00402:105:000000000-C9YJM:1:2113:13559:16965 81 3 3146516 23 2M1I9M1D1M1D91M 3 3146515 -105 TACAATAATGTTTATTTTTCTAACATATTTTTAAAAATAAACATTATTGAATTGAATAGAAGGTCCTTACTCTTTTCATCAGGAAGTAAGTCAGCTTGCAGTAT HFFHHHGGG3FHFHAFHEHHHHGGHHFHHHFHHHHHHGHHHFHFHHFHHHHHHHHHHGHHGFCFHGGDGHFCFHFFHHHFFHGGGGGGGGCGFBFFFFFBBBBA LB:Z:XXX PG:Z:SNAP RG:Z:XXX PL:Z:ILLUMINA NM:i:6
ERROR M00402:105:000000000-C9YJM:1:2113:13559:16965 161 3 3146515 23 12M1D1M1D91M 3 3146516 105 TTCAATAATGTTTATTTTTCTAACATATTTTTAAAAATAAACATTATTGTATTGAATAGAAGGTCCTTACTCTTTTCATCAGGAAGTAAGTCAGCTTGCAGTAT BAAAAFBDFFDFGGGFGGGEGGHGFHHHHGFHHCGHCFHHHHHFH5DGF5BGGF55EGHFHFFHFHHHB3DGHGHFGHEHHHHHCHBFHHHF5BGHHHHEFBGH LB:Z:XXX PG:Z:SNAP RG:Z:XXX PL:Z:ILLUMINA NM:i:8

thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F120&data=02%7C01%7Cbolosky%40microsoft.com%7C78d961e0811c4d1b553a08d6a0887b6b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636872905378085879&sdata=EPnsC%2Bl2j9L7py9ZOM13OpS%2BREr57ijq724BxNOPNNs%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA752SX98ZJkeY0BWkDK3DC13GJg0Ohmks5vTO8mgaJpZM4bb-bF&data=02%7C01%7Cbolosky%40microsoft.com%7C78d961e0811c4d1b553a08d6a0887b6b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636872905378085879&sdata=GCQ%2BiG2G8fn65orNhtX1%2FxBM8QIS0d9Z0CBsfoTlGPE%3D&reserved=0.

bolosky commented 3 years ago

The newly release 1.0 version has sort (and duplicate) marking that produces output that goes through GATK (at least HaplotypeCaller) fine. I'm closing this issue. If you still see a problem, please reopen it or create a new issue.