Closed Jyoung-0108 closed 10 months ago
According to the log, the error appears to actually be in the igdiscover_igblast
rule. If the below doesn’t fix the issue, can you show the contents of the file /lustre/home/acct-medsn/medsn/IMQ1_BCR/.snakemake/log/2023-07-06T155201.795624.snakemake.log
mentioned in the very last line of the log?
Regading barcodes/UMIs: What IgDiscover calls "barcode" is actually the UMI, so your thinking is correct. However, IgDiscover expects the UMI to be at the 5' end of the read, so if you set barcode_length_5prime
to 10, the first 10 bases of the 10X barcode would be interpreted as the UMI. I think this would mean that all reads from one GEM would be merged into a single one by IgDiscover. This might even explain the above igdiscover_igblast
error.
My suggestion is to try setting barcode_length_5prime
to 26 to make IgDiscover treat the combined 10X barcode plus UMI as UMI. (Disclaimer: I have not used IgDiscover myself with 10X data.)
According to the log, the error appears to actually be in the
igdiscover_igblast
rule. If the below doesn’t fix the issue, can you show the contents of the file/lustre/home/acct-medsn/medsn/IMQ1_BCR/.snakemake/log/2023-07-06T155201.795624.snakemake.log
mentioned in the very last line of the log?Regading barcodes/UMIs: What IgDiscover calls "barcode" is actually the UMI, so your thinking is correct. However, IgDiscover expects the UMI to be at the 5' end of the read, so if you set
barcode_length_5prime
to 10, the first 10 bases of the 10X barcode would be interpreted as the UMI. I think this would mean that all reads from one GEM would be merged into a single one by IgDiscover. This might even explain the aboveigdiscover_igblast
error.My suggestion is to try setting
barcode_length_5prime
to 26 to make IgDiscover treat the combined 10X barcode plus UMI as UMI. (Disclaimer: I have not used IgDiscover myself with 10X data.)
Thanks for your reply. And I've tried to corrected the barcode_length_5prime
to 26 but it didn't fix my issue. I still got the problem. and i've attached the log file here.
2023-07-07T095124.994995.snakemake.log
hello,i've updated my issue and i'm waiting for your reply. Thanks!
Can you please run igdiscover igblastwrap --sequence-type=Ig --threads=16 iteration-01/database reads/sequences.fasta.gz > /dev/null
and paste the output here? There should be an error message.
Thanks! I ran the commands and got the following output.
INFO: Running IgBLAST on input reads
Traceback (most recent call last):
File "/lustre/home/acct-medsn/medsn/.conda/envs/igdiscover/bin/igdiscover", line 10, in
Thanks, that is helpful. It appears that no reads remain after preprocessing. That is, IgBLAST works on zero input sequences. There are a couple of preprocessing steps and it is unclear to me at the moment which one filters out all reads. What are the contents of reads/2-pear.log
?
I uploaded the log file here. 2-pear.log
Ok, that step went fine. What are the contents of the files stats/trimmed.json
and stats/groups.json
?
... and of reads/4-sequences.fasta.gz.log
?
it seemed that all sequences were trimmed. groups.json: { "unique_barcodes": 0, "barcode_singletons": 0, "groups_written": 0, "group_size_1": 0, "group_size_2": 0, "group_size_3plus": 0 }
and trimmed.json: {"trimmed": 21081671}
The large number in trimmed.json
is actually the number of sequences remaining after trimming, so it is good that it is high. ("Trimming" actually refers to primer removal, but since you haven’t configured any primers in your configuration, no trimming is done anyway.)
The problem appears to be in the "grouping" step where IgDiscover mainly groups together sequences that have the same UMI, but it also does some minor filtering. In particular, it filters out sequences that are too short, and it seems that in your case, it considers all sequences to be too short:
INFO: 21081671 sequences in input
INFO: 0 sequences long enough
Your configuration file has this:
minimum_merged_read_length: 300
So it appears that all your sequences end up as being shorter than 300 bp.
What is actually the read length of your 10X reads? Maybe the data isn’t suitable for IgDiscover.
Hey,
I've been running IgDiscover (version 0.15.1) with my sequencing data, however, the run fails each time in this rule read_length_histogram. The log printed on Terminal and my configuration file are attached here.
log.txt igdiscover.txt
I also have another question: The structure of my Ig sequences is and the 10X barcode is 16 nt-long, which the UMI is 10 nt-long. I'm wondering if I should set the 'barcode_length_5prime: ' to 10 or 16? I set it to 10. My sequencing dataset includes 2 sample and i think UMI represents the real UMI here in igdiscover, am i right about it? Thank you a lot!