Closed prmunn closed 1 year ago
Hmmm, OK this one's on me.
In the lines below umi_tools whitelist
obtains the initial whitelist, then (optionally) error corrects cell barcodes to it. It then checks if a whitelist was created and if so, writes it out, with counts per barcode. If it doesn't exist, it returns an warning/error explaining that no local minima could be found in the density plot of barcode counts, which is how the knee is identified with --knee-method=density
.
https://github.com/CGATOxford/UMI-tools/blob/289b9cc87f35bd06249ef6ae680e590524bc83f3/umi_tools/whitelist.py#L437-L491
The issue comes when using --knee-method=density --ed-above-threshold=correct
and no knee is identified and hence, no whitelist generated. The warning/error should obviously occur prior to the error correction.
In short, the identification of the knee hasn't worked and this isn't caught at the right point.
To remedy this, I would suggest using --knee-method=distance
, which is the default and should be more robust. You can inspect the plot afterwards to check you're happy with it.
If you do want to stick with knee-method=density
, you can leave off --error-correct-threshold=2 --ed-above-threshold=correct
to get around the above error, and include --allow-threshold-error
so that the knee plots are generated. You can then inspect the plots and manually set the knee threshold in a subsequent run with --set-cell-number
.
I'd favour taking the --knee-method=distance
approach.
In the meantime, I'll update whitelist
so this error is caught properly
One final comment, that barcode pattern looks very long. Do you really have a 26bp cell barcode?
Thanks for responding so quickly. The knee method = distance appears to have worked. I'll try out your other suggestion with knee method = density and manually setting cell number.
When I run whitelist using the following command: umi_tools whitelist --knee-method=density --method=reads --plot-prefix Mix1_predictBC --allow-threshold-error --extract-method string --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNNCCCCCCCCCC --error-correct-threshold=2 --ed-above-threshold=correct -L Mix1_predictedBCwhitelist.log -I Mix1_I2_I1_padUMI_R2.fastq.gz -S Mix1_predictedBCwhitelist.txt
I get the following error: /programs/UMI-tools/lib64/python3.6/site-packages/umi_tools/whitelist_methods.py:202: UserWarning: Attempted to set non-positive left xlim on a log-scaled axis. Invalid limit will be ignored. fig3.set_xlim(0, len(counts)*1.25) Traceback (most recent call last): File "/programs/UMI-tools/bin/umi_tools", line 8, in
sys.exit(main())
File "/programs/UMI-tools/lib64/python3.6/site-packages/umi_tools/umi_tools.py", line 61, in main
module.main(sys.argv)
File "/programs/UMI-tools/lib64/python3.6/site-packages/umi_tools/whitelist.py", line 455, in main
resolution_method=options.ed_above_threshold)
File "/programs/UMI-tools/lib64/python3.6/site-packages/umi_tools/whitelist_methods.py", line 543, in errorDetectAboveThreshold
cell_whitelist = list(cell_whitelist)
TypeError: 'NoneType' object is not iterable
And the predicted whitelist file has a size of zero. However, when I run the same command on a test dataset consisting of 100,000 records from the original dataset it runs without error and I results in my predicted whitelist file.
I've attached the log file from the original run that failed. Please help Mix1_predictedBCwhitelist.log