Open hukai916 opened 2 years ago
Ill need more details. but is it possible one read fails decoding and the filter outgoing qc fail
or filter incoming qc fail
or is on?
What info do you need? I am attaching my test data and code here: https://www.dropbox.com/sh/6idqunobck72q5j/AAClfpkwhusMVDPNx6HU7wSYa?dl=0
Note that bam2 is missing the last record where bam1 and bam3 are correct. Let me know if you need other info. Thanks!
@hukai916 so all bam files in the dropbox folder are outputs from the cram file? Can you please post the json config file you use and which version of pheniqs you are using?
log_decode2.txt seems to be the one showing a missing record. 450000 are reported in incoming and only 449999 are reported in sample. being the last record does point to some thread synchronization issue but obviously an extremely rare one since I have never encountered it. How many cores is this running on?
Hi, all bams are created from the same cram using the same pheniqs command. The missing-last-record cases occur occasionally, roughly 10%-20%, no matter how many cores I use. I will give more info after the holidays.
Can you please post pbmc_500_5p_3_aa.json and I will try and replicate it.
L.
Hi L., I have uploaded the json file to the same folder. The Pheniqs I used is: pheniqs version 2.1.0 @moonwatcher
--Kai
Hello Kai
Just a quick update., I have been trying to reproduce this but have so far been unsuccessful. I tried both with a build of pheniqs 2.1.0 and with head.
A side comment: using an extension of ".bam1" pheniqs does not detect that you actually wanted bam encoding as so it will revert to the default SAM output. This is why the files you provided are bigger than I expected. They are actually uncompressed, simple text in SAM format. You can override the default output format with --format bam
.
Are you observing this on MacOS or Linux? What exact platform? might also be useful to know what dependencies. Did you build the binary yourself or installed it from Conda?
It is possible this is related to one of the dependencies. Since this is so tricky to replicate, will you be willing to test on custom build? I can provide instructions if you are interested.
Thanks L.,
I installed Pheniqs with conda on a Ubuntu Xenial container (Docker). The container contains solely miniconda, Pheniqs, pysam, and SAMTools. The container can be pulled by:
docker pull hukai916/pheniqs_xenial:0.2
I can perform more tests, pls share instructions. Thanks!
--Kai
Hi developers,
I encountered a mysterious issue when using Pheniqs for barcode correction.
Basically, when performing the correction, I first prepare a CRAM file containing barcodes to be corrected and a json file according to the tutorial, then run Pheniqs. However, the resulting BAM files is one record less than the CRAM, and the missing one seems to be always the last record in the CRAM.
This happens only occasionally. For example, I run the same code for 6 times and capture one (bam3):
Not a big matter, but still annoying, I can provide more info if you would like to look into it.
Thanks,
--Kai