CovertLab / wcEcoli

Whole Cell Model of E. coli
Other
18 stars 4 forks source link

Broken analysis script boilerplate #162

Closed jmason42 closed 6 years ago

jmason42 commented 6 years ago

Nearly every analysis script has the same boilerplate code for direct execution via a terminal: https://github.com/CovertLab/wcEcoli/blob/17048cb215c53d8489a4539a1722f31416925017/models/ecoli/analysis/single/aaCounts.py#L62-L76

This has been copied and pasted everywhere, and at some point it broke because the standard analysis function signature changed. This pattern should be implemented by some utility so we're not copying and pasting code, and likewise so we can change the call pattern to match the new script signature.

Also, I'm going to say that the use of the .__dict__ attribute here is completely wrong.

1fish2 commented 6 years ago

Agreed. We can make all these programs subclass a common base class and their main() functions override an abstract method to make the required parameter pattern explicit, let tools point out problems, and share command-line parsing code. (It'd also be good to add default parameter values.)

Meanwhile, when editing an analysis script, we can quickly repair __main__ by making the validationDataFile parameter optional:

def main(simOutDir, plotOutDir, plotOutFileName, simDataFile, validationDataFile = None, metadata = None):
tahorst commented 6 years ago

I agree that we should have defaults easily loaded or at the very least create the appropriate paths from one base path that gets passed to the function. Also the path referenced by SERIALIZED_KB_DIR doesn't actually exist so our default sim_data object wouldn't be able to be loaded. We could point the constant to the cached/ directory that users should have created if they followed the setup instructions.

Another weird thing about our analysis files is that although nearly all of them have a shebang to run with python, very few actually have permissions that allow execution.

jmason42 commented 6 years ago

Yeah, that's another boilerplate thing that got copied through generations of code. I'd much rather explicitly call python analysis_script.py anyway. Actually, do file permissions even persist through version control? I suppose they might.

1fish2 commented 6 years ago

Yes, version control manages file permissions.

jmason42 commented 6 years ago

Was thinking about removing the shebang lines but thought I'd first check to see what is currently executable. Surprising number of Python files are set to executable:

$ find . -name "*.py" -perm -u+rwx
./wholecell/analysis/analysis_tools.py
./wholecell/states/bulk_molecules.py
./wholecell/states/state.py
./wholecell/states/unique_molecules.py
./wholecell/utils/sparkline.py
./wholecell/utils/units.py
./wholecell/utils/fitting.py
./runscripts/fittingAnalysis/synthetaseTurnover.py
./runscripts/debug/summarize_environment.py
./runscripts/reconstruction/reaction_mass_balance.py
./runscripts/fileManipulation/convertToVariantSimulation.py
./runscripts/fileManipulation/deleteEmptySims.py
./runscripts/fileManipulation/addNonPdfPlots.py
./models/ecoli/analysis/cohort/massFractionSummary.py
./models/ecoli/analysis/single/ntpCounts.py
./models/ecoli/analysis/single/KmOptimization.py
./models/ecoli/analysis/single/glucoseMassYield.py
./models/ecoli/analysis/single/massFractionSummary.py
./models/ecoli/analysis/single/rnapCounts.py
./models/ecoli/analysis/single/evaluationTime.py
./models/ecoli/analysis/single/dntpCounts.py
./models/ecoli/analysis/single/rnaseCounts.py
./models/ecoli/analysis/single/rnapActiveFraction.py
./models/ecoli/analysis/single/mrnaVsProteinCounts.py
./models/ecoli/analysis/single/rnaDegradationCounts.py
./models/ecoli/analysis/single/tRnaCounts.py
./models/ecoli/analysis/single/proteinCounts.py
./models/ecoli/analysis/single/ribosome50SCounts.py
./models/ecoli/analysis/single/ribosomeCapacity.py
./models/ecoli/analysis/single/mrnaCounts.py
./models/ecoli/analysis/single/ribosomeCounts.py
./models/ecoli/analysis/single/aaCounts.py
./models/ecoli/analysis/single/ribosome30SCounts.py
./models/ecoli/analysis/single/massFractions.py
./models/ecoli/analysis/single/mRnaHalfLives.py
./models/ecoli/analysis/single/mediaExchange.py
./models/ecoli/analysis/single/processMassBalance.py
./models/ecoli/analysis/single/KmRNAdecayComparison.py
./models/ecoli/analysis/single/waterCount.py
./models/ecoli/analysis/variant/massFractionSummary.py
./models/ecoli/sim/initial_conditions.py
./models/ecoli/sim/variants/gene_knockout.py
./models/ecoli/sim/variants/EndoKcatFullRNA.py
./models/ecoli/sim/simulation.py
./models/ecoli/processes/rna_degradation.py
./models/ecoli/processes/complexation.py
./models/ecoli/processes/atp_usage.py
./models/ecoli/processes/polypeptide_elongation.py
./models/ecoli/processes/transcript_elongation.py
./models/ecoli/processes/metabolism.py
./models/ecoli/processes/protein_degradation.py
./models/ecoli/processes/polypeptide_initiation.py
./models/ecoli/processes/transcript_initiation.py
./models/ecoli/listeners/ribosome_data.py
./models/ecoli/listeners/transcript_elongation_listener.py
./models/ecoli/listeners/mass.py
./models/ecoli/listeners/rna_degradation_listener.py
./models/ecoli/listeners/ATPhydrolyzedUsage_listener.py
./reconstruction/ecoli/simulation_data.py
./reconstruction/ecoli/dataclasses/constants.py
./reconstruction/ecoli/dataclasses/moleculeGroups.py
./reconstruction/ecoli/dataclasses/state/uniqueMolecules.py
./reconstruction/ecoli/dataclasses/state/state.py
./reconstruction/ecoli/dataclasses/state/stateFunctions.py
./reconstruction/ecoli/dataclasses/state/bulkMolecules.py
./reconstruction/ecoli/dataclasses/process/translation.py
./reconstruction/ecoli/dataclasses/process/transcription.py
./reconstruction/ecoli/dataclasses/process/complexation.py
./reconstruction/ecoli/dataclasses/process/replication.py
./reconstruction/ecoli/dataclasses/process/metabolism.py
./reconstruction/ecoli/dataclasses/process/process.py
./reconstruction/ecoli/dataclasses/relation.py
./reconstruction/ecoli/dataclasses/getterFunctions.py
./reconstruction/ecoli/fit_sim_data_2.py
./reconstruction/ecoli/flat/scripts/kb_to_flat.py
./reconstruction/ecoli/flat/scripts/biomassMetaboliteMod.py
./reconstruction/ecoli/flat/scripts/unifyBulkFiles.py
./reconstruction/ecoli/compendium/growth_data.py
./reconstruction/spreadsheets.py
./runPylintScript.py

A few shell scripts, too:

$ find . -name "*.sh" -perm -u+rwx
./working.sh
./runscripts/debug/time_libraries.sh
./runscripts/reconstruction/markus_spreadsheet_to_kb.sh
./detect_lostruns_time.sh

Much of this is probably copy-and-paste (I assume that retains permissions). E.g. I popped open a random listener file, and it contains no code that would be run (other than a class definition). I'm hesitant about ripping it all out but that might be the only realistic way to clean things up.

1fish2 commented 6 years ago

.sh files should be executable (unless expected to be run via source to modify the calling shell like source activate some-pyenv).

Each .py file with main code and a shebang line should probably be executable. Every other .py file should not have a shebang line or be executable since it will only run via python something.py or imported by other python files.

cp in the shell carries over the execute permissions. Copy/paste into a new editor window should not.

runPylintScript.py is hard wired to /home/users/sajia!

jmason42 commented 6 years ago

That file can probably go, I think it was from when we first tried running a linter. Having come from mostly working in Python, I usually run shell scripts via sh some_script.sh; other than convenience is there a compelling reason to set it to be executable and run it directly?

1fish2 commented 6 years ago

Kind of the point of a shell script is to act like an executable. And running it directly means you don't have to know which shell to run it in, sh or bash or another.

jmason42 commented 6 years ago

Gotcha. I guess that address the shebang lines in Python, too - nowadays we tend to use automatically managed virtual environments but in the past it made sense to specify the path to the installation of Python you need.