Closed CWYuan08 closed 2 years ago
Can you post the full command you used and the first few lines of the fastq file(s)?
Can you post the full command you used and the first few lines of the fastq file(s)?
Thank you for prompt reply!
I used: umi_tools extract --bc-pattern=CCCCCCCCNNNNNNNN \ --stdin SRR4199344_1.fastq.gz \ --stdout SRR4199344.R1.extracted.fq.gz \ --read2-stdout \ --read2-in SRR4199344_2.fastq.gz \ --filter-cell-barcode --whitelist=barcodes.fill.short.txt
and the input:
@SRR4199344.1 1 length=150 ATGCCGAAGCCCCCCCATGAAAAAAATACTTTTCTTTTTTTTTCTTTTTCTTCTTTTATGTATTTTTGTTTTTTATTCTTTTTTGTCTTTATTTGTTTATTTATATTTTCATTTTCTTTATTCTCTTATTTCTCATTTATATTTTATTTA +SRR4199344.1 1 length=150 AAAFF-F---7-7--77--------<----<----<------<-77---7AF--7-7--------7-----7-7-------7-7------------7--77--7----77--7--7----7--7--7-----7----------------- @SRR4199344.2 2 length=150 ACAGCAGACGGCCGCCTTATTCTTTTACAATTTTTTTTTTTTTATTAATAATCCTTGGGTTCTCCGCACAGAGGGGGATCGGGCAGGGTCAGGAGACAAGAGGGGGGGGAAGGACAGCAAAAAAAAAAGTAAACAAAGCTCTCGGGTTCA +SRR4199344.2 2 length=150 AAFFFJJA---77-------------------------<<F-<A77-7-7---7---)7--)7-7)-<))7--<7---7))))))<-))77)7-)--<<-AF-7))---))777-<)7<7--7---7-----7--<7-----)))-7<)-
There should be a line in the error message underneath U.error("parsing error: expected '@' in line %s" % line1)
- line1
here isn't referring to the first line of your input file, but the current line for file 1.
You should have and output under this with the line number on it. One way this error might be caused is if you have an empty line at the end of your fastq file.
Hi I checked the end of my file and it isn't empty, should I search through the whole file? Many thanks
Hi I tried to used bbmap to remove empty reads,
it looks like it doesn't have empty read: Input: 49833473 reads 7475020950 bases Short Read Discards: 0 reads (0.00%) 0 bases (0.00%) Output: 49833473 reads (100.00%) 7475020950 bases (100.00%)
could you please advise other ways to check this?
Thank you very much!
You could try
zcat SRR4199344_1.fastq.gz | awk 'NR % 4 == 1' | grep -v "^@" | wc -l
Should show the read names of any reads that don't have "@" at the start.
You might also check that the output of
zcat SRR4199344_1.fastq.gz | wc -l
divides exactly by 4.
Thank you, I tried zcat SRR4199344_1.fastq.gz | awk 'NR % 4 == 1' | grep -v "^@" | wc -l gives 0
and zcat SRR4199344_1.fastq.gz | wc -l gives 199333892, which can be divided by 4.
I am still not sure why there is an error..
Thank you again
I have downloaded SRR4199344 and am trying to run the analysis myself. I'll let you know what answer I come to.
Seems to run fine for me, but I was doing it without a whitelist. What options were you using for creating the whitelist?
Thank you very much!
I have attached my whitelist. Do you mind sharing your command with me? barcodes.fill.short.txt
Best, CW
Okay, I just ran with your whitelist, and it worked fine without any error. Is it possible that your input files are corrupted in some way?
The MD5 of the files I'm using are:
$ md5sum SRR4199344*gz
9115da2f11b8e9347b74c3862f48ccf0 SRR4199344_1.fastq.gz
964dbcaf93cb129832aae250627a033a SRR4199344_2.fastq.gz
I downloaded these from ENA.
BTW the command I used was:
umi_tools extract --bc-pattern=CCCCCCCCNNNNNNNN --stdin SRR4199344_1.fastq.gz --stdout SRR4199344.R1.extracted.fq.gz --read2-stdout --read2-in SRR4199344_2.fastq.gz --filter-cell-barcode --whitelist=barcodes.fill.short.txt
Dear Ian,
many thanks, I tried to redownload my input files, this time the command did run with no errors, and it parsed INFO Input Reads: 49833473 and INFO Filtered cell barcode: 49833473 but my SRR4199344.R1.extracted.fq.gz looks empty, do you know why this is? Should I drop the whitelist?
Thank you again and happy new year!
Best, CW
Sorry, I never looked inside your barcode white list. Where did you find it? Its not formatted correctly. You can see the format here https://umi-tools.readthedocs.io/en/latest/reference/whitelist.html, everything after the first column is optional. Basically, the whitelisted barcodes need to be in the first column.
Dear Ian,
thank you very much! Do I have to run this whitelist command to generate the file? I know the barcodes, could I just have 1 column (the barcodes) and then run the previous command again?
Many thanks CW
Yes, you can just have a 1 column file with the barcodes and run the extract command again.
Hi, I tried to run umi_tools extract, but I got the issue " U.error("parsing error: expected '@' in line %s" % line1)". I checked my line 1, it does start with "@", could you please help me on this?Thank you very much!