Closed lys1001s closed 8 months ago
@lys1001s Thanks for reporting this issue. We have never tested CODEC on MGI system and I can see the reason might be related to the different naming convention. I may have a quick fix for you. You could also try if removing everything including and after '/' would work or not.
Thank you for the reply! I will try that!
@ruolin Hi! Removing everything including and after '/' seems working well, but the other problem was found.
While trimming, these errors came out: read 1 name E200002744L1C001R04204792525 and read 2 name E200002744L1C024R02200636124 do not match! ... (long)
I searched E200002744L1C001R04204792525 in fastq.gz before/after 'CODEC:demuliti', and there were E200002744L1C001R04204792525 in R1 and R2 both in fastq.gz before. After demulti, that of R2 has been missing.
I am doing demulti again to make sure, just in case. I will reach out once again when it is done, but meanwhile, could you check if demulti process is working well with T7? Thank you
@lys1001s very sorry for the very late response! I now have a fix for the T7 data. But I don't have any real data to test the fix. Could you please pull the latest master and test on your data? Let me know how it goes!
@ruolin Great, thank you! Does it include fix for demulti too? I tried once again but failed at trimming due to missing reads after demutli.
@lys1001s I see. Looks like I need to have some minimum test data to replicate your error. Would you like to share me some data?
@lys1001s just FYI, I reverted the change in the master since the fix did not work for you. I am Looking forward a test data that I can use to fix your issue properly.
@ruolin Thank you for your suggestion. I will prepare the test data. How do you want to receive the data?
I don't think I need a lot of reads. Maybe you can send to my email ruolin@broadinstitute.org.
@ruolin Hi! I sent the email with test data. Have you received it well? Thank you!
@lys1001s Yes. I got it. I will take a look this week.
issue fixed in #17
Hello, I am a user who has been using CODEC Suite very well. Until now, I've had no issues using CODEC for samples sequenced with NovaSeq. However, I'm encountering a problem while trimming the fastq of samples sequenced with MGI T7 and wanted to reach out. I will attach the error log below:
read 1 name E200002744L1C001R03802620803/1:GAGCCTACTCAGTCAACG and read 2 name E200002744L1C001R03802620803/2:GTGTCGAACACTTGACGG do not match!
This error appears for all reads, and the log size is several gigabytes.
Below, I've attached the headers of the demultiplexed results of the sample sequenced with T7 for your reference. Here are the first lines from R1.fastq.gz and R2.fastq.gz:
@E200002744L1C001R03802620803/1:GAGCCTACTCAGTCAACG @E200002744L1C001R03802620803/2:GTGTCGAACACTTGACGG
Thank you.