epi2me-labs / pychopper

cDNA read preprocessing
Other
61 stars 9 forks source link

issue of alignment scores file #41

Closed zhangqc723 closed 11 months ago

zhangqc723 commented 1 year ago

In my alignment scores file of pychopper, I find some same line (for example, Line1 and Line2), could you tell me why? ![Uploading image.png…]()

zhangqc723 commented 1 year ago

5621486d-2446-40f0-ac9f-2fdd3ab199bf runid=c590fb43de99977d6e044d3c03311c335c20ad77 read=4305 ch=283 start_time=2023-08-01T21:45:18.473421+08:00 flow_cell_id=PAS00612 protocol_group_id=basecalled sample_id=20230801-HML231087-P4-PAS00612-fast parent_read_id=5621486d-2446-40f0-ac9f-2fdd3ab199bf basecall_model_version_id=dna_r10.4.1_e8.2_5khz_400bps_fast@v4.2.0 38 80 MyVNP 4 + 5621486d-2446-40f0-ac9f-2fdd3ab199bf runid=c590fb43de99977d6e044d3c03311c335c20ad77 read=4305 ch=283 start_time=2023-08-01T21:45:18.473421+08:00 flow_cell_id=PAS00612 protocol_group_id=basecalled sample_id=20230801-HML231087-P4-PAS00612-fast parent_read_id=5621486d-2446-40f0-ac9f-2fdd3ab199bf basecall_model_version_id=dna_r10.4.1_e8.2_5khz_400bps_fast@v4.2.0 38 80 MyVNP 4 + 5621486d-2446-40f0-ac9f-2fdd3ab199bf runid=c590fb43de99977d6e044d3c03311c335c20ad77 read=4305 ch=283 start_time=2023-08-01T21:45:18.473421+08:00 flow_cell_id=PAS00612 protocol_group_id=basecalled sample_id=20230801-HML231087-P4-PAS00612-fast parent_read_id=5621486d-2446-40f0-ac9f-2fdd3ab199bf basecall_model_version_id=dna_r10.4.1_e8.2_5khz_400bps_fast@v4.2.0 3363 3408 MyVNP 100 -

nrhorner commented 1 year ago

Hi @zhangqc723

I'm not sure why you are getting duplicate entries here. I'll take a look and get back to you.

zhangqc723 commented 1 year ago

Thanks, and I want to know if a reads not in the score file, is there any primer present?

zhangqc723 commented 1 year ago

@nrhorner When I get the output of pychopper, a reads does not appear in score file. But, I find it includes a primer.For example, This my config file:

MyVNP TGAGAGACAAGATTGTTCGTGGACACGAGCATCAGCAGCATACGA MySSP TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTGCGCAATG This my reads from unclass_output: TATGTTGTGGCCTGGTTCAGTTACGTATTTGTTTAGAGACAAGATTGTTCGTGGACACGAGCATCAGCCGCATAGGATTTTTTTTTTTTTTTTTTTTTTTAATTATTGAGATGGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGTGCAATCTCGGCTCACTGCAACTCCATCTCCTGGGTTCAAACAATTCTTGTGTCTCAGCCTACCAGTAGCCGGGATTACAGACTCGAGCCACCACGCCCAGCTAATTTTTTTTGTTGTTGTTTAAAGTAGAGACGGGTTTCACCACGTTGGCCAGGCTGGTCTCAAACTCCTGGCCTCAAGCGATCCGCCTGCCTCAGCCTCCCAAAGGGCTGGGATTACAGGCATGAGCCACCGTGCCGGTGTGTCCATGGGTTTTCTTTTATGCTTTAAGGAGGGTGATCTCAGCATGGAGCCCAAAGACTCTGGCACTTCCCTTTGTCCTTAAACCTCCTTGGCCAAATTTAGATTTAGTCTCTTTGAAGTGCTGGTTCACAATAGGATGTTTTTGTTGTTTTGTTAAAGTGGGAAAGAGCAATGAAAGAAAAAGTGAGAGAATTTTAACACTTGGGGCTGACACTGTCCAGGAAAACCATGGATGTATAGTGTGGGAGGGAGGAGGAGAAACTGACGTTTTTGAAGTTGGAAAGGTGTGGAGGTGGCATCTTAATTTAGTTGTGCAAAGAGGAAGTAAAAAGATTGACAATAAGACGTGTATAGATCAGAATAGAACGTTCTTAAGAGGGAAAGATGAAACATAAGCTGTTAATATTTTAATTCCCAGTCTGTTCTTAAAGATGAGAAAGCTTTGATCAGCAAGGCGAACATTCTAGAAAGAACTATTACGTGTATGTGGGGGGTGGGGAATGTGTGTATGCTGCATCTGTCAGACCTAGTCATTACAGTGTTCTGGGCGTAAGAACTCCGATTCTCATATTGCATTCTCTTCCATCACTTTATTTGGGGTGAAGCACATCGTCCTGTCAGTATCCACATTTGAAAAATAAAGAGATCCTGGCTAAATTGGGATCTCAAGTTCACTTAGTTTTTAGTAAGGGGAACTTGGTGAAAAATCGACTTGTGAGGTCTCCAGAAACACTTAATTGATAATGAGTCAAAAGGCATTACTCTTGGCATGTGAATATTGGATGTGAGCTAGAGGGTCAGTCAAATGCCTGTGGAGCCTGGATTCATGTTCTTTCCCGTTTGTCAGTAATCCTTTCTAATGTTCCAGTTCCCATGATGTTGATTTTAGTGGAATAAAACTTGAACCACTTAGTTATGATGTTGTTACTGTGTTGGGAACGCAGCAGCCACCAGACCACCAAGAGCAACTGTAGGTTGGGCTTGGTGGTGCTGGTTTAGTGTTGTGGCTGATGAGGTGTAACCCAGGAAATTTTTATTTTTGCTTTTAAAAAAACAACTCATCTGTGGTCATCTGTAAGTGGAATATTGGACTATAAGGACTCTAAGTTACCAAAAACATTTTGTGACTTAGCCTGAGGCTAACAGAAATAACTTCCCCCATGTTCAGATAGTTTAGGCATATCTGAGTTGGTGGAGGAAATGTGATCTGTTCCTAGCCCTATGTGCTAGGCAGGAGTGCCTGGGTATCCAAGAGTGAGTGAGAGGCCGGCGCGATGCATGCCTGTAATTCCAGCACTCTGGGAGGCGAAGCGCTTGAGCCCAAGAGTTCAAGACCAGCCTGGGCAACATGGGAAAAACCTGTCTCTACGAAAAAAAAAAAAAAAAAAAAAAAAAAAATCGTATGTGCTGATGCTCAGCAATTAGTAA This is NCBI blast result: image In fact, a myVNP primer is included by this reads. Could you tell me why didn't it appear in score file.

nrhorner commented 1 year ago

Hi @zhangqc723 I'm unable to replicate your problem with the identical entries in the AS score bed file output. Would you be able to hare a few reads that are affected please so I can debug. Also the command you used and the version of pychopper please

nrhorner commented 1 year ago

Thanks, and I want to know if a reads not in the score file, is there any primer present?

Yes the scores file details the primer hits.

nrhorner commented 11 months ago

Closing due to lack of response.