Closed Chunmingl closed 4 months ago
@Chunmingl i'm a bit hesitated about using SQLite as intermediate files for manipulating the summary stats because in my experience, this format is easily corrupted and not very robust. However I"m open to trying it in this context and see if it works better. We don't have a large data-set for the gene-level information anyways so maybe either way it is going to work out.
@gaow I am thinking maybe we can have summary info generated for each gene then save it as small size gene-summary file, then if there were re-processed genes for the same data type, (same summary file name) this gene-summary file will get overwritten with the updated information, and we can run another script to summarize all the summary file in a directory into a single meta file.
I thought of SQL format as I was thinking of if the pipeline constantly updating on one meta-file and then it will be an issue when multiple process try to read-write at the same time.
May I know more about the export script idea that you have mentioned in the meeting?
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB