CCBR / Pipeliner

An open-source and scalable solution to NGS analysis powered by the NIH's Biowulf cluster.
4 stars 0 forks source link

Fix mmul FASTQ screen; currently not using mmul as a species #477

Closed slsevilla closed 1 year ago

slsevilla commented 1 year ago

When selecting mmul as a reference, FASTQ screen does not include this species in its report.

Config location:

/data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen.conf

References included:

DATABASE     Human   /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/hg19/hg19                                         
DATABASE     Mouse  /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/mm9/mm9                                                        
#DATABASE     Phix    /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/PhiX/phix
#DATABASE     Salmo   /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Salmo_salar_clone/Salmo_salar
#DATABASE     Uni_Vec /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/UniVec_vectors/UniVec_vectors                
DATABASE     Bacteria        /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Bacteria/bacteria
DATABASE     Fungi        /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Fungi/fungi
DATABASE     Virus  /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Virus/virus
#DATABASE     rRNA  /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/rRNA/rRNA
#DATABASE     Lambda  /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Lambda/Lambda  

Create index location

/data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/mmul_8

Config should be updated to include mmul

DATABASE     Mmul   /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/mmul_8/mmul8                                         
DATABASE     Human   /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/hg19/hg19                                         
DATABASE     Mouse  /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/mm9/mm9                                                        
#DATABASE     Phix    /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/PhiX/phix
#DATABASE     Salmo   /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Salmo_salar_clone/Salmo_salar
#DATABASE     Uni_Vec /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/UniVec_vectors/UniVec_vectors                
DATABASE     Bacteria        /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Bacteria/bacteria
DATABASE     Fungi        /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Fungi/fungi
DATABASE     Virus  /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Virus/virus
#DATABASE     rRNA  /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/rRNA/rRNA
#DATABASE     Lambda  /data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/Lambda/Lambda  
slsevilla commented 1 year ago

Since not all species may want the overlap, suggestion is to create a separate fastq_screen config for mmul

  1. Create new FASTQ_SCREEN.conf
  2. Edit JSON
  3. Create Index

1. Create new FASTQ_SCREEN.conf

/data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_mmul8.conf

2. Edit JSON

https://github.com/CCBR/Pipeliner/blob/activeDev/Mmul_8.0.1.json

# from
"FASTQ_SCREEN_CONFIG": "/data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen.conf",

# to
"FASTQ_SCREEN_CONFIG": "/data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_mmul8.conf",

3. Create index

/data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_db/mmul_8
slsevilla commented 1 year ago

Code updated with commit 852041b55890060cea58d1eaeed4823f33a82e42

Code location: /data/CCBR_Pipeliner/4.0.4/Pipeliner

slsevilla commented 1 year ago

Despite making this update, the run.json is still being created with FASTQ_SCREEN_CONFIG set as

"FASTQ_SCREEN_CONFIG": "/data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen.conf",

I edited the run.json to use the correct config, however, every time a run / dryrun is being deployed its re-written. I hard coded it to work correctly for our current run, but this needs to be corrected

"FASTQ_SCREEN_CONFIG": "/data/CCBR_Pipeliner/db/PipeDB/lib/fastq_screen_mmul8.conf"

I can't figure out where the issue is and why the run.json is being created incorrectly. Can you review @kopardev @skchronicles , please?

skchronicles commented 1 year ago

Hey @slsevilla,

I hope all is going well on your side!

Shared reference files across genomes are coming from the standard-bin.json file. Nested within each pfamily (i.e. exomeseq, rnaseq, etc), there is a key for this config file. You will need to edit this file for it to be picked up by the pipelines.

Please let me know what you think.

Best Regards, @skchronicles

slsevilla commented 1 year ago

Hi @skchronicles - So I see the standard-bin.json which I can edit, but I'm confused as to why the species specific json files aren't being used? What are those for, if not for these params?

kopardev commented 1 year ago

Fixed.