Open brcopeland opened 2 years ago
Now the final data is still in scratch space correct? Can we input a different path in the .yaml
file to specify the final output (+QC and logs for all codes and software versions)? Maybe also related to your database for all raw and processed data...
Yes so the user is prompted as to a location currently (and this defaults to TSCC scratch) for output, logs, etc. From my perspective I wonder if it makes sense to allow the user to specify an archival location or if this should be hard-coded, have restrictions placed on it, etc.
I do like the idea to archive the pipeline itself too. The versions are encoded in the wrapper.
Yeah but they will get lost in the scratch space ... maybe talk to Joe when the pipeline development is done ...
Ideally, we have a chief bioinformatician that manages all the input and output data, receive tasks from the lab, assign tasks to all bioinfo unders (Danny used something like Trello, which looks more like how you do this in the industry), undergrads run the pipelines with different parameters after meeting with you and the postdoc who requested ... and report back when anything happens ... and in the mean time, all those information go to a database ... That's how the LIMS does I believe ... But in the real world ... you know it better than me ...
Good points you bring up. Things have been changing sufficiently that I don't think it makes a lot of sense to have other people running the pipeline but in the hopefully near future that can and perhaps should change.
Yeah now that we recruited so many undergrad and graduate bioinfo trainees you should consider this system.
The final path should be in a subdirectory in /projects/ps-gleesonlab7/Uniformly_processed_data
once the structure is decided on. Ideally this should also update the google doc with the paths to BAMs and VCFs.
As per subject. As a final step the pipeline should store this data automatically or else some other comprehensive solution should be found.