broadinstitute / chem-bio-dos-del

Initiated 2021Q4 for code related to the Broad Chemical Biology DNA-encoded library (DEL) analysis and visualization pipeline
MIT License
1 stars 0 forks source link

Eliminate the need to purge metadata from the pipeline. #9

Open codewarrior2000 opened 2 years ago

codewarrior2000 commented 2 years ago

Sequencing runs from MiSeq and HiSeq were sometimes determined to be unsuccessful after the FASTQ data have been subjected to the DEL analysis pipeline. If the counts are too low, then the sequencing must be re-done by the original sequencing lab or by an alternative lab. As a consequence, the DEL analysis pipeline contains residual metadata from the first sequencing run. Scientists had requested to purge old metadata from the pipeline for fear that it will interfere with future analysis.

It is not sustainable to continue accommodating DEL analysis app's users requests to delete sample metadata and run metadata for whatever reason. Real multi-user web apps don't accommodate that kind of request from users.

The immediate solution is to instruct users not to reuse run_ids and samp_ids if a sequencing run fails and needs to be resequenced.

The long-term solution is to re-design the app to use a relational database that is implemented with auto-incremented indices as the primary keys of the run and sample metadata, which will allow users to re-use run_ids and samp_ids when re-doing sequencing runs.

codewarrior2000 commented 2 years ago

Zher Yin analysis issue on DEL analysis app.pdf This is the email thread from April 26, 2022 to May 3, 2022 between Bruce Hua and Larry Chung to discuss resolution of the issue where Zher Yin was seeing incorrect enrichment values in the DEL analysis app.