Implement Archival of Final Data

Gleeson-Lab / wxs_pipeline

Starting with BAMs and FASTQs, follow GATK 4.0 Best Practices up to generating a joint-genotyped VCF

1 stars 1 forks source link

Implement Archival of Final Data #6

Open brcopeland opened 2 years ago

brcopeland commented 2 years ago

As per subject. As a final step the pipeline should store this data automatically or else some other comprehensive solution should be found.

shishenyxx commented 2 years ago

Now the final data is still in scratch space correct? Can we input a different path in the .yaml file to specify the final output (+QC and logs for all codes and software versions)? Maybe also related to your database for all raw and processed data...

brcopeland commented 2 years ago

Yes so the user is prompted as to a location currently (and this defaults to TSCC scratch) for output, logs, etc. From my perspective I wonder if it makes sense to allow the user to specify an archival location or if this should be hard-coded, have restrictions placed on it, etc.

brcopeland commented 2 years ago

I do like the idea to archive the pipeline itself too. The versions are encoded in the wrapper.

shishenyxx commented 2 years ago

Yeah but they will get lost in the scratch space ... maybe talk to Joe when the pipeline development is done ...

shishenyxx commented 2 years ago

Ideally, we have a chief bioinformatician that manages all the input and output data, receive tasks from the lab, assign tasks to all bioinfo unders (Danny used something like Trello, which looks more like how you do this in the industry), undergrads run the pipelines with different parameters after meeting with you and the postdoc who requested ... and report back when anything happens ... and in the mean time, all those information go to a database ... That's how the LIMS does I believe ... But in the real world ... you know it better than me ...

brcopeland commented 2 years ago

Good points you bring up. Things have been changing sufficiently that I don't think it makes a lot of sense to have other people running the pipeline but in the hopefully near future that can and perhaps should change.

shishenyxx commented 2 years ago

Yeah now that we recruited so many undergrad and graduate bioinfo trainees you should consider this system.

brcopeland commented 2 years ago

The final path should be in a subdirectory in /projects/ps-gleesonlab7/Uniformly_processed_data once the structure is decided on. Ideally this should also update the google doc with the paths to BAMs and VCFs.