ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
234 stars 123 forks source link

Problem about build_genome_data.sh #279

Closed Mozillian1 closed 1 year ago

Mozillian1 commented 2 years ago

I'm trying to use this script to build my own genome data. I found that if I set REF_FA=somegenome.fa, this script will remove '.fa' file as a tmp file in line 292~294 and set the corresponding '.gz' file as ref_fa in the final TSV file. But I failed to find lines to compress the original fa file. So when I carry on analyses with the built genome TSV, I will encounter an error that the '.fa.gz' file cannot be found. I suggest adding a line like gzip -nc ${REF_FA_PREFIX} > ${REF_FA_PREFIX}.gz after line 212. Thank you for this pipeline.

leepc12 commented 2 years ago

Thanks for reporting this. I will fix it in the next release.

leepc12 commented 1 year ago

Sorry about very late reply, didn't have time to work on this issue. Please gzip the FASTA file before submitting it to the builder so that it has a .fa.gz suffix.