broadinstitute / CODECsuite

analysis pipeline for CODEC data
Other
9 stars 6 forks source link

error when trimming with the samples sequenced with MGI T7 #12

Closed lys1001s closed 8 months ago

lys1001s commented 1 year ago

Hello, I am a user who has been using CODEC Suite very well. Until now, I've had no issues using CODEC for samples sequenced with NovaSeq. However, I'm encountering a problem while trimming the fastq of samples sequenced with MGI T7 and wanted to reach out. I will attach the error log below:

read 1 name E200002744L1C001R03802620803/1:GAGCCTACTCAGTCAACG and read 2 name E200002744L1C001R03802620803/2:GTGTCGAACACTTGACGG do not match!

This error appears for all reads, and the log size is several gigabytes.

Below, I've attached the headers of the demultiplexed results of the sample sequenced with T7 for your reference. Here are the first lines from R1.fastq.gz and R2.fastq.gz:

@E200002744L1C001R03802620803/1:GAGCCTACTCAGTCAACG @E200002744L1C001R03802620803/2:GTGTCGAACACTTGACGG

Thank you.

ruolin commented 1 year ago

@lys1001s Thanks for reporting this issue. We have never tested CODEC on MGI system and I can see the reason might be related to the different naming convention. I may have a quick fix for you. You could also try if removing everything including and after '/' would work or not.

lys1001s commented 1 year ago

Thank you for the reply! I will try that!

lys1001s commented 1 year ago

@ruolin Hi! Removing everything including and after '/' seems working well, but the other problem was found.

While trimming, these errors came out: read 1 name E200002744L1C001R04204792525 and read 2 name E200002744L1C024R02200636124 do not match! ... (long)

I searched E200002744L1C001R04204792525 in fastq.gz before/after 'CODEC:demuliti', and there were E200002744L1C001R04204792525 in R1 and R2 both in fastq.gz before. After demulti, that of R2 has been missing.

I am doing demulti again to make sure, just in case. I will reach out once again when it is done, but meanwhile, could you check if demulti process is working well with T7? Thank you

ruolin commented 1 year ago

@lys1001s very sorry for the very late response! I now have a fix for the T7 data. But I don't have any real data to test the fix. Could you please pull the latest master and test on your data? Let me know how it goes!

lys1001s commented 1 year ago

@ruolin Great, thank you! Does it include fix for demulti too? I tried once again but failed at trimming due to missing reads after demutli.

ruolin commented 1 year ago

@lys1001s I see. Looks like I need to have some minimum test data to replicate your error. Would you like to share me some data?

ruolin commented 1 year ago

@lys1001s just FYI, I reverted the change in the master since the fix did not work for you. I am Looking forward a test data that I can use to fix your issue properly.

lys1001s commented 1 year ago

@ruolin Thank you for your suggestion. I will prepare the test data. How do you want to receive the data?

ruolin commented 1 year ago

I don't think I need a lot of reads. Maybe you can send to my email ruolin@broadinstitute.org.

lys1001s commented 11 months ago

@ruolin Hi! I sent the email with test data. Have you received it well? Thank you!

ruolin commented 11 months ago

@lys1001s Yes. I got it. I will take a look this week.

ruolin commented 8 months ago

issue fixed in #17