bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

bcbio_prepare_samples.py isn't processing BAM files properly #3055

Closed amizeranschi closed 4 years ago

amizeranschi commented 4 years ago

I'm running into trouble running bcbio_prepare_samples.py on BAM files, as it doesn't seem to create links to these in the output directory. I'm running a recent development version (1.2.0a).

Here's a minimal reproducible example based on one of the bcbio_nextgen tests:


## set these as needed
bcbio_path=/export/home/ncit/external/a.mizeranschi/bcbio_nextgen
test_path=/export/home/ncit/external/a.mizeranschi

## download the bcbio test data
export PATH=${bcbio_path}/tools/bin:${bcbio_path}/anaconda/bin:${PATH}
cd ${test_path}
rm -rf ${test_path}/test_bcbio_cwl
git clone https://github.com/bcbio/test_bcbio_cwl.git
cd ${test_path}/test_bcbio_cwl

## set up a CSV file for some FASTQ and BAM files from the bcbio test data
echo "samplename,description,batch" > test-bcbio.csv
echo "${test_path}/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX_1.fq.gz,7_100326_FC6107FAAXX_fq,test-bcbio" >> test-bcbio.csv
echo "${test_path}/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX_2.fq.gz,7_100326_FC6107FAAXX_fq,test-bcbio" >> test-bcbio.csv
echo "${test_path}/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_wumis_R1.fq.gz,7_wumis,test-bcbio" >> test-bcbio.csv
echo "${test_path}/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_wumis_R2.fq.gz,7_wumis,test-bcbio" >> test-bcbio.csv
echo "${test_path}/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam,6_100326_FC6107FAAXX,test-bcbio" >> test-bcbio.csv
echo "${test_path}/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX_2.bam,6_100326_FC6107FAAXX_2,test-bcbio" >> test-bcbio.csv
echo "${test_path}/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam,7_100326_FC6107FAAXX,test-bcbio" >> test-bcbio.csv

## run bcbio_prepare_samples.py on the CSV file
bcbio_prepare_samples.py --out input-data --csv test-bcbio.csv

ls -1 input-data

After running the previous commands I end up with only the FASTQ files linked into the input-data directory. Is this a bug or is there something different to how BAM files should be set up?

roryk commented 4 years ago

Heya, you should only be getting FASTQ files in this case so it is working as intended.

amizeranschi commented 4 years ago

@roryk Thanks for the reply.

Would it be possible to link the BAM files into the output directory as well?

This might avoid some confusion from users who have a mix of FASTQ and BAM files for their samples.