Closed jmason42 closed 6 years ago
Agreed. We can make all these programs subclass a common base class and their main()
functions override an abstract method to make the required parameter pattern explicit, let tools point out problems, and share command-line parsing code. (It'd also be good to add default parameter values.)
Meanwhile, when editing an analysis script, we can quickly repair __main__
by making the validationDataFile
parameter optional:
def main(simOutDir, plotOutDir, plotOutFileName, simDataFile, validationDataFile = None, metadata = None):
I agree that we should have defaults easily loaded or at the very least create the appropriate paths from one base path that gets passed to the function. Also the path referenced by SERIALIZED_KB_DIR
doesn't actually exist so our default sim_data
object wouldn't be able to be loaded. We could point the constant to the cached/
directory that users should have created if they followed the setup instructions.
Another weird thing about our analysis files is that although nearly all of them have a shebang to run with python, very few actually have permissions that allow execution.
Yeah, that's another boilerplate thing that got copied through generations of code. I'd much rather explicitly call python analysis_script.py
anyway. Actually, do file permissions even persist through version control? I suppose they might.
Yes, version control manages file permissions.
Was thinking about removing the shebang lines but thought I'd first check to see what is currently executable. Surprising number of Python files are set to executable:
$ find . -name "*.py" -perm -u+rwx
./wholecell/analysis/analysis_tools.py
./wholecell/states/bulk_molecules.py
./wholecell/states/state.py
./wholecell/states/unique_molecules.py
./wholecell/utils/sparkline.py
./wholecell/utils/units.py
./wholecell/utils/fitting.py
./runscripts/fittingAnalysis/synthetaseTurnover.py
./runscripts/debug/summarize_environment.py
./runscripts/reconstruction/reaction_mass_balance.py
./runscripts/fileManipulation/convertToVariantSimulation.py
./runscripts/fileManipulation/deleteEmptySims.py
./runscripts/fileManipulation/addNonPdfPlots.py
./models/ecoli/analysis/cohort/massFractionSummary.py
./models/ecoli/analysis/single/ntpCounts.py
./models/ecoli/analysis/single/KmOptimization.py
./models/ecoli/analysis/single/glucoseMassYield.py
./models/ecoli/analysis/single/massFractionSummary.py
./models/ecoli/analysis/single/rnapCounts.py
./models/ecoli/analysis/single/evaluationTime.py
./models/ecoli/analysis/single/dntpCounts.py
./models/ecoli/analysis/single/rnaseCounts.py
./models/ecoli/analysis/single/rnapActiveFraction.py
./models/ecoli/analysis/single/mrnaVsProteinCounts.py
./models/ecoli/analysis/single/rnaDegradationCounts.py
./models/ecoli/analysis/single/tRnaCounts.py
./models/ecoli/analysis/single/proteinCounts.py
./models/ecoli/analysis/single/ribosome50SCounts.py
./models/ecoli/analysis/single/ribosomeCapacity.py
./models/ecoli/analysis/single/mrnaCounts.py
./models/ecoli/analysis/single/ribosomeCounts.py
./models/ecoli/analysis/single/aaCounts.py
./models/ecoli/analysis/single/ribosome30SCounts.py
./models/ecoli/analysis/single/massFractions.py
./models/ecoli/analysis/single/mRnaHalfLives.py
./models/ecoli/analysis/single/mediaExchange.py
./models/ecoli/analysis/single/processMassBalance.py
./models/ecoli/analysis/single/KmRNAdecayComparison.py
./models/ecoli/analysis/single/waterCount.py
./models/ecoli/analysis/variant/massFractionSummary.py
./models/ecoli/sim/initial_conditions.py
./models/ecoli/sim/variants/gene_knockout.py
./models/ecoli/sim/variants/EndoKcatFullRNA.py
./models/ecoli/sim/simulation.py
./models/ecoli/processes/rna_degradation.py
./models/ecoli/processes/complexation.py
./models/ecoli/processes/atp_usage.py
./models/ecoli/processes/polypeptide_elongation.py
./models/ecoli/processes/transcript_elongation.py
./models/ecoli/processes/metabolism.py
./models/ecoli/processes/protein_degradation.py
./models/ecoli/processes/polypeptide_initiation.py
./models/ecoli/processes/transcript_initiation.py
./models/ecoli/listeners/ribosome_data.py
./models/ecoli/listeners/transcript_elongation_listener.py
./models/ecoli/listeners/mass.py
./models/ecoli/listeners/rna_degradation_listener.py
./models/ecoli/listeners/ATPhydrolyzedUsage_listener.py
./reconstruction/ecoli/simulation_data.py
./reconstruction/ecoli/dataclasses/constants.py
./reconstruction/ecoli/dataclasses/moleculeGroups.py
./reconstruction/ecoli/dataclasses/state/uniqueMolecules.py
./reconstruction/ecoli/dataclasses/state/state.py
./reconstruction/ecoli/dataclasses/state/stateFunctions.py
./reconstruction/ecoli/dataclasses/state/bulkMolecules.py
./reconstruction/ecoli/dataclasses/process/translation.py
./reconstruction/ecoli/dataclasses/process/transcription.py
./reconstruction/ecoli/dataclasses/process/complexation.py
./reconstruction/ecoli/dataclasses/process/replication.py
./reconstruction/ecoli/dataclasses/process/metabolism.py
./reconstruction/ecoli/dataclasses/process/process.py
./reconstruction/ecoli/dataclasses/relation.py
./reconstruction/ecoli/dataclasses/getterFunctions.py
./reconstruction/ecoli/fit_sim_data_2.py
./reconstruction/ecoli/flat/scripts/kb_to_flat.py
./reconstruction/ecoli/flat/scripts/biomassMetaboliteMod.py
./reconstruction/ecoli/flat/scripts/unifyBulkFiles.py
./reconstruction/ecoli/compendium/growth_data.py
./reconstruction/spreadsheets.py
./runPylintScript.py
A few shell scripts, too:
$ find . -name "*.sh" -perm -u+rwx
./working.sh
./runscripts/debug/time_libraries.sh
./runscripts/reconstruction/markus_spreadsheet_to_kb.sh
./detect_lostruns_time.sh
Much of this is probably copy-and-paste (I assume that retains permissions). E.g. I popped open a random listener file, and it contains no code that would be run (other than a class definition). I'm hesitant about ripping it all out but that might be the only realistic way to clean things up.
.sh files should be executable (unless expected to be run via source
to modify the calling shell like source activate some-pyenv
).
Each .py file with main code and a shebang line should probably be executable. Every other .py file should not have a shebang line or be executable since it will only run via python something.py
or imported by other python files.
cp in the shell carries over the execute permissions. Copy/paste into a new editor window should not.
runPylintScript.py is hard wired to /home/users/sajia
!
That file can probably go, I think it was from when we first tried running a linter. Having come from mostly working in Python, I usually run shell scripts via sh some_script.sh
; other than convenience is there a compelling reason to set it to be executable and run it directly?
Kind of the point of a shell script is to act like an executable. And running it directly means you don't have to know which shell to run it in, sh
or bash
or another.
Gotcha. I guess that address the shebang lines in Python, too - nowadays we tend to use automatically managed virtual environments but in the past it made sense to specify the path to the installation of Python you need.
Nearly every analysis script has the same boilerplate code for direct execution via a terminal: https://github.com/CovertLab/wcEcoli/blob/17048cb215c53d8489a4539a1722f31416925017/models/ecoli/analysis/single/aaCounts.py#L62-L76
This has been copied and pasted everywhere, and at some point it broke because the standard analysis function signature changed. This pattern should be implemented by some utility so we're not copying and pasting code, and likewise so we can change the call pattern to match the new script signature.
Also, I'm going to say that the use of the
.__dict__
attribute here is completely wrong.