Closed choosehappy closed 6 years ago
This appears to be limited to the QC directory, as the bam files in the "final" directory are all of different sizes:
root@sib-pc25:/export/big/ajanowcz/livermet/proj/final# ls -l `find . | grep bam | grep "b-" | grep -v bai | sort`
-rwxrwxrwx 1 root root 66920020 Jan 4 15:54 ./888_PD18790b/888_PD18790b-disc.bam
-rwxrwxrwx 1 root root 12916031830 Jan 4 15:54 ./888_PD18790b/888_PD18790b-ready.bam
-rwxrwxrwx 1 root root 10506773 Jan 4 15:54 ./888_PD18790b/888_PD18790b-sr.bam
-rwxrwxrwx 1 root root 58365782 Jan 4 16:22 ./888_PD18793b/888_PD18793b-disc.bam
-rwxrwxrwx 1 root root 11690415573 Jan 4 16:22 ./888_PD18793b/888_PD18793b-ready.bam
-rwxrwxrwx 1 root root 7943727 Jan 4 16:22 ./888_PD18793b/888_PD18793b-sr.bam
-rwxrwxrwx 1 root root 66824776 Jan 4 16:25 ./888_PD18796b/888_PD18796b-disc.bam
-rwxrwxrwx 1 root root 11986694572 Jan 4 16:25 ./888_PD18796b/888_PD18796b-ready.bam
-rwxrwxrwx 1 root root 8930630 Jan 4 16:25 ./888_PD18796b/888_PD18796b-sr.bam
-rwxrwxrwx 1 root root 132817279 Jan 4 16:05 ./888_PD18810b/888_PD18810b-disc.bam
-rwxrwxrwx 1 root root 23758126851 Jan 4 16:05 ./888_PD18810b/888_PD18810b-ready.bam
-rwxrwxrwx 1 root root 10376467 Jan 4 16:05 ./888_PD18810b/888_PD18810b-sr.bam
root@sib-pc25:/export/big/ajanowcz/livermet/proj/final#
As well as other output files (e.g., optitype):
root@sib-pc25:/export/big/ajanowcz/livermet/proj/final# md5sum `find . | grep csv | grep opti | grep b`
68547cacea8baca2f77792b768fe4880 ./888_PD18796b/888_PD18796b-hla-optitype.csv
157466b65a85de73cf982953ea30fdb2 ./888_PD18810b/888_PD18810b-hla-optitype.csv
3c748aa56f665bdd30ebae2e605e87e5 ./888_PD18790b/888_PD18790b-hla-optitype.csv
f060b536986b2d2d2c4e8a4f5d71b5b6 ./888_PD18793b/888_PD18793b-hla-optitype.csv
root@sib-pc25:/export/big/ajanowcz/livermet/proj/final#
Thank you for this detailed report and sorry about the issue. This was due to a recent QC change where we re-used part of the the existing QC dictionary but in shared cases this ended up duplicating it across samples for the outputs. I pushed a fix by explicitly ensuring the dictionary is not shared. If you upgrade to the latest development version (bcbio_nextgen.py upgrade -u development
), remove the QC directories in your final
output and re-run it should not put the right QC metrics there.
Thank you again for the report and please let us know if you run into any other issues.
yup, that did it, thanks!
root@51853b033dc8:/data/livermet/proj/final# md5sum `find . | grep kraken_summary | grep b | sort `
ce65fe15e645df57e0c32b09813b0899 ./888_PD18790b/qc/kraken/kraken_summary
a61d59d98b7773d40da828435dc66b9d ./888_PD18793b/qc/kraken/kraken_summary
e58075c2951d1724c38f8f236a5e820d ./888_PD18796b/qc/kraken/kraken_summary
2953f382ccc13c34874939873f5f3535 ./888_PD18810b/qc/kraken/kraken_summary
root@51853b033dc8:/data/livermet/proj/final#
I have 4 sets of patients, each has a normal, primary, and metastasis
I use a template to call variants on these, and have a csv file which looks like this:
Which as I understand, if I label the primary as “b1a”, and the metastasis as “b1c”, then I can label the normal as “b1a,b1c” and have it use the sample twice.
This appears to work as expected downstream
My issue is, the QC directory has the same exact files for all of the normal samples, which is very unlikely given that the bam files produced are themselves different. I can use md5sum to show this:
versus the working directory:
It looks like bcbio is taking the first normal and copying it into all other directories.
versus:
Note, the tumor samples have the correct values, its only the normals that appear to be affected