Ideal defaults for output files and directories

tjthurman commented 2 years ago

Right now, we've got a mix of default outputs for the various results files that we make:

Some files, like the targets.csv and raw data big query file, just get automatically output to the working directory.
The QC and proteinorm reports, by default, get put into directories named protein_analysis/01_quality_control
The new limma output functions are now a little all over the place: the tables default to protein_analysis, but the results report doesn't have a default at the moment.
Related to all this is the issue of logging: some functions output logging info to files that are, by default, in a new folder named "logs" in the working directory. Other functions "log" things by saving them as R objects, and other functions (especially the re-written limma functions) don't have logging yet.

Would be good to standardize all this. Maybe separate output folders for things that go to end users versus things that stay with us? Or some other structure that works better with the GCP?

ByrumLab commented 2 years ago

The "txt", "protein_analysis", and "phospho_analysis" folders are what is zipped using the python script on the GCP and it gets automatically uploaded to the uams-enduser bucket to send out to the user.

Check out the python zip script here: uams-scripts/python/zip

Within the protein and phospho analysis folders are the

QC folder
DE results
either the txt folder (maxquant raw output) or the "Samples report ....csv" (DIA raw data)
TMT label design file for TMT projects

The Scaffold file I send separately from the uams-output/projects_2022 project directory.

All the other files need to be in a separate "for internal use only" directory containing logs and such files in case the project needs to be quickly re-ran with additional contrasts or something.

ideally from the log, I should be able to quickly fill out a Methods write-up for how the specific project was analyzed.

clw09 commented 2 years ago

Absolutely separate folders for end user and internal use would be good I think. I end up moving files for the core to the logs or a docs folder myself just to separate things out. Probably some sort of reports folder with various files or R objects to populate certain fields in in the html report. I've played around with this a little with past markdown stuff. I severely dislike/hate the logging stuff I did but I committed to the structure for consistency and it got the info out one way or another lol

I addition to better log files would it be worth setting up some external environments to send parameters or summary info calculated within functions? so the info can be accessed in the report? I am not a coding expert soooo it is probably a bad idea lol :)

Cheers!

Charity L. Washam, PhD

Instructor of Biochemistry and Molecular Biology

Bioinformatician, UAMS Bioinformatics Core

University of Arkansas for Medical Sciences

4301 W Markham St., Slot 516

Little Rock, Arkansas 72205

Bioinformatician, CTPR Genomics and Bioinformatics Resource

Center for Translational Pediatric Research (CTPR)

(www.archildrens.org/archildrens-COBRE)

Arkansas Children's Research Institute (ACRI)

13 Children’s Way, Slot 512-47

Little Rock, Arkansas 72202-3591

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom it is addressed. This communication may contain material protected by attorney-client privilege. If you are not the intended recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error and that any use dissemination, forwarding, printing, or copying of this email and any file attachments is strictly prohibited. If you have received this email in error, please notify me immediately by reply email. You must destroy the original transmission and its contents.

From: Tim Thurman @.> Sent: Friday, June 3, 2022 12:07:47 PM To: ByrumLab/proteomicsDIA @.> Cc: Subscribed @.***> Subject: [ByrumLab/proteomicsDIA] Ideal defaults for output files and directories (Issue #38)

Right now, we've got a mix of default outputs for the various results files that we make:

Some files, like the targets.csv and raw data big query file, just get automatically output to the working directory.
The QC and proteinorm reports, by default, get put into directories named protein_analysis/01_quality_control
The new limma output functions are now a little all over the place: the tables default to protein_analysis, but the results report doesn't have a default at the moment.
Related to all this is the issue of logging: some functions output logging info to files that are, by default, in a new folder named "logs" in the working directory. Other functions "log" things by saving them as R objects, and other functions (especially the re-written limma functions) don't have logging yet.

Would be good to standardize all this. Maybe separate output folders for things that go to end users versus things that stay with us? Or some other structure that works better with the GCP?

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ByrumLab_proteomicsDIA_issues_38&d=DwMCaQ&c=27AKQ-AFTMvLXtgZ7shZqsfSXu-Fwzpqk4BoASshREk&r=E045ukXXqOEQLWSfZLobKA&m=zgTltMqsR2N2805xaOIQDsu7nnYVqWKmJO8YBcpPU28&s=TogVzXnh319ES5WaYoODhbrkesdMJhgg4CfeNPn7ZOQ&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AK4NJ7BIJ2BAVS6GF2AWY7TVNI3WHANCNFSM5XZTNXUA&d=DwMCaQ&c=27AKQ-AFTMvLXtgZ7shZqsfSXu-Fwzpqk4BoASshREk&r=E045ukXXqOEQLWSfZLobKA&m=zgTltMqsR2N2805xaOIQDsu7nnYVqWKmJO8YBcpPU28&s=ToPc5tg3A6YKLGXZZCAdwOg2Hd38qkl91lHklEP17bk&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

tjthurman commented 2 years ago

This is something we'll want to change for the public package: the default should probably just be to output everything in the current working directory (instead of making protein_analysis/01_QC_report and similar files, or subdirectories for the various plots). Then, users can specify something besides the current working directory if they want.

tjthurman commented 1 year ago

No longer relevant to the public version of the package. Something we'll want to figure out for the internal version of the package.

ByrumLab / proteoDA

Ideal defaults for output files and directories #38