Dlux804 / McQuade-Chem-ML

Development of easy to use and reproducible ML scripts for chemistry.
5 stars 1 forks source link

Made changes to code pipeline to allow main.py to run. Also fixed several TODO's #55

Closed dickeygh closed 4 years ago

dickeygh commented 4 years ago

RAW COMMITS:

COMMIT 1: Added "from core.etc..." to lines 17-23 of models.py. Uncommented the confusion_matrix and classification_report on lines 101-102 of train.py

COMMIT 2: Added "from core import...." to lines 9-11 of compare.py so that the … …code would compile/run properly. Added "import cirpy" to ingest.py so that line 53 could run, as it uses cirpy.

COMMIT 3: Fixed the "TODO Make task identification automatic based on dataset" … …by adding an if statement that sets self.task_type to 'regression' or 'classification' based on the dataset that was being used. Also removed "task" from the init variables as a result.

COMMIT 4: IN MODELS.PY: Added features.featurize and features.datasplit functio… …ns to the run() function in order to featurize and split the dataset. Commented out lines 118-120, line 128, line 144, line 147, and line 153, as main.py would not run properly with these lines. IN NAME.PY: Changed the file directory to match my local directory, as I could not get it to run with any version of "../dataFiles" etc. IN MAIN.PY: Removed the drop=True argument in line 66, as this is no longer necessary. Commented out line 75, as model.featurization is no longer a function. Removed the tune=True argument in line 77, as this is no longer necessary.

COMMIT 5: IN MODELS.PY: Changed lines 158-163 to be correctly commented for my … …windows machine. Could we just ask the user what os they are using? Added comments to lines 82 and 83. IN MAIN.PY: Deleted un-necessary old comments/code.

COMMIT 6: IN MODELS.PY: Changed feat_meth to receive a featurization option num… …ber (from main.py) instead of using [0] every time. IN NAME.PY: Added classifier models to algorithm_list IN TRAIN.PY: Commented out lines 79 and 80 to use conf and clsrep as individual variables instead of arrays. Added print statements to show the user accuracy score, confusion matrix, classification report, and roc_auc_score for each run. Commented out the .mean and .std for classification report, as it is attempting to average string variables (lines 125-130). IN MAIN.PY: Changed line 66 and line 90 to send a method (featurization option). Removed out-dated code at old lines 97-98. Changed line 98 to fit new pipeline.

COMMIT 7: IN MAIN.PY: Added comments next to the declaration of feats list. Rem… …oved un-necessary code at lines 51-54.

SUMMARY:

COMPARE.PY

  1. Changed lines 9-11 to have "from core" so that it would run properly.
  2. Commented out line 444 so it would run properly.

INGEST.PY

  1. Added "import cirpy" at line 3 in order to have it run properly.

MODELS.PY

  1. Added "from core." to lines 17-23 so it would run properly.
  2. Removed task declaration (line 27). This is because the program now automatically sets self.task_type based on the data set being used (lines 34-38)
  3. Changed feat_meth = [0] to feat_meth (line 27). This is so it will use whichever featurization option is set in main.py.
  4. Added featurization and datasplit to init function (lines 82-83)
  5. Commented out lines 117-120, 128, 144, 153 because they were breaking it while running from main.py.
  6. Changed comments at lines 158-163 to work on my windows machine.

NAME.PY

  1. Changed directory in line 17 to work on my machine (could not get it to work with any variation of ./dataFiles for some reason).
  2. Added classification algorithms to algorithm_list (line 68). This is so classification would run properly from main.py.

TRAIN.PY

  1. Commented out lines 79-80 so that conf and clsrep could be used as non-array variables.
  2. Changed lines 100-112 to print out classification metrics for each run.
  3. Changed comments from lines 125-130 in order to get it to run properly. This is because some of these lines are trying to average string variables. I will look for a way around this so that we can use arrays to store the classification report and confusion matrix.

MAIN.PY

  1. Added comments to lines 26 and 39 (this is where you set up which featurization options you want).
  2. Removed un-necessary code from lines 51-54.
  3. Changed line 62 to send the featurization method and removed drop=True to match new pipeline.
  4. Removed un-necessary code from old lines 74-75.
  5. Removed tune=True from line 72 to fit new pipeline.
  6. Added model.analyze at line 73.
  7. Changed line 86 to send a featurization method and removed drop=False to match new pipeline.
  8. Removed un-necessary code from old lines 97-98.
  9. Changed line 94 to model.run() instead of model.classification_run(). This fits the new pipeline.

Other notes/thoughts: I apologize if any of my solutions or changes are not taking the pipeline in the direction that has been envisioned. I left all of the TODO comments in the code in case the solutions that I implemented are not accomplishing the perceived goal. I envision this PR going through multiple iterations before it is ready to be sent to the dev branch. Please check this out, run main.py, and provide me with feedback.

pep8speaks commented 4 years ago

Hello @dickeygh! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 444:1: E265 block comment should start with '# '

Line 17:80: E501 line too long (96 > 79 characters) Line 27:5: E303 too many blank lines (3) Line 27:80: E501 line too long (109 > 79 characters) Line 27:101: E251 unexpected spaces around keyword / parameter equals Line 27:103: E251 unexpected spaces around keyword / parameter equals Line 37:80: E501 line too long (189 > 79 characters) Line 82:34: E261 at least two spaces before inline comment Line 116:9: E265 block comment should start with '# ' Line 117:9: E265 block comment should start with '# ' Line 118:9: E265 block comment should start with '# ' Line 119:9: E265 block comment should start with '# ' Line 127:9: E265 block comment should start with '# ' Line 143:9: E265 block comment should start with '# ' Line 146:9: E265 block comment should start with '# ' Line 146:80: E501 line too long (82 > 79 characters) Line 152:9: E265 block comment should start with '# ' Line 152:80: E501 line too long (82 > 79 characters) Line 157:80: E501 line too long (105 > 79 characters) Line 158:80: E501 line too long (101 > 79 characters) Line 161:80: E501 line too long (99 > 79 characters) Line 162:80: E501 line too long (94 > 79 characters)

Line 17:80: E501 line too long (121 > 79 characters) Line 68:80: E501 line too long (118 > 79 characters)

Line 79:5: E265 block comment should start with '# ' Line 80:5: E265 block comment should start with '# ' Line 129:8: E131 continuation line unaligned for hanging indent

Line 26:80: E501 line too long (169 > 79 characters) Line 26:87: E261 at least two spaces before inline comment Line 39:80: E501 line too long (165 > 79 characters) Line 39:87: E261 at least two spaces before inline comment Line 72:38: E261 at least two spaces before inline comment Line 74:36: E261 at least two spaces before inline comment Line 95:38: E261 at least two spaces before inline comment

Comment last updated at 2020-06-16 21:22:04 UTC
dickeygh commented 4 years ago

In the last commit, I made the following changes:

MODELS.PY

  1. Removed line 82 because this was moved to main.py (lines 72 and 95)

NAME.PY

  1. Added import pathlib (line 4) so that it could be used in line 17.

MAIN.PY

  1. Added "from core.features import featurize" (line 7) so that featurization could be performed in main.py (lines 72 and 95).