Make CLI logging when bulk-processing more friendly (QOL)

ppavlidis commented 1 year ago

Suggestions for the experiment or arrayDesign CLI framework, when processing multiple entities:

[ ] When beginning processing of each new entity, log something like "starting 2/6" so it is easier to tell how far it is along
[ ] When summarizing errors at the very end, put the error message on the same line as the entity info, and put the short name and ID of the experiment first without parentheses. This will make it easier to use things like grep x | cut $1 to sort the errors into types.

The current format is like

ExpressionExperiment Id=3911 Name=Gene Experssion Profiling-Based Identification of Molecular Subtypes in Stage IV Melanoma with Different Clinical Outcome (test set) (GSE22153):
        Missing values not tolerated in design matrix

Instead something like:

GSE22153 3911 <name> <error message>

arteymix commented 1 year ago

I'd advocate instead for a tabular output for bulk-processed datasets via an option and maybe a companion flag that turns the standard output into a tabular output. I think we would also have to ensure that the logs are sent to stderr to make it work with cut & al.

I have a branch that separates the bulk processing features from AbstractCLI in a AbstractBatchProcessingCLI. That would allow us to put more tailored features without polluting all the tools.

ppavlidis commented 1 year ago

Sounds good

PavlidisLab / Gemma

Make CLI logging when bulk-processing more friendly (QOL) #633