banillie / analysis_engine

Place for all code used to compile the quarterly PfM Report, manage GMPP data, as well as other useful data searching/analysis functions.
MIT License
2 stars 2 forks source link

Simplifying your set-up #5

Closed yulqen closed 3 years ago

yulqen commented 4 years ago

As discussed, there is scope for simplification here. You don't need Pycharm for running scripts in this repo.

Really, you could replace all the guidance at https://github.com/banillie/analysis_engine#virtual-environment-set-up with something much simpler.

It would mean your staff running commands in a terminal again, but there are advantages to this:

Instead, the could have a cheatsheet of commands to type into a terminal, or even copy and paste and hit return.

I recommend that they use the new Windows 10 terminal (https://www.microsoft.com/en-us/p/windows-terminal/9n0dx20hk701) but cmd.exe in Windows would do, or better use the Windows Powershell terminal. I can discuss with you the subtle differences of using these commands in Windows compared to Ubuntu, but in practice nowadays, there is very little difference.

If you wanted to remove the whole Pycharm bit from this process, this is how you could set up the virtualenv without it:

Then, when they want to run scripts in your code, they just ensure they are in the analysis_engine root directory, activate the virtualenv, and fire off the commands:

banillie commented 4 years ago

Sounds great. Will have a look/play around with it. Would be good to fix up sometime with you to discuss. Will try and find sometime in our diaries. Cheers, Will.

yulqen commented 4 years ago

Yeah - just something to consider. You might find they're in the groove with the current set up, but definitely worth considering. Let me know when you're available and I'd be happy to go through it.

banillie commented 3 years ago

@yulqen Check out main.py for the latest on where I've got to on putting code into a cml. First sprint completed and working. I'm going to re-read the real python argparse tutorial and see how I can refactor and improve. But, would also welcome your input. Would be great to discuss on tomorrow. Cheers.

yulqen commented 3 years ago

Have pulled latest repo, created a virtualenv, installed from requirements.txt and blithely done python main.py and get ImportError at this point. Will investigate...

Traceback (most recent call last):
  File "/home/lemon/code/python/analysis_engine/main.py", line 27, in <module>
    from data_mgmt.data import (
  File "/home/lemon/code/python/analysis_engine/data_mgmt/data.py", line 9, in <module>
    import matplotlib.pyplot as plt
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/__init__.py", line 1111, in <module>
    rcParamsOrig = RcParams(rcParams.copy())
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/__init__.py", line 891, in __getitem__
    from matplotlib import pyplot as plt
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/pyplot.py", line 32, in <module>
    import matplotlib.colorbar
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/colorbar.py", line 40, in <module>
    import matplotlib._constrained_layout as constrained_layout
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/_constrained_layout.py", line 52, in <module>
    from matplotlib.legend import Legend
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/legend.py", line 43, in <module>
    from matplotlib.offsetbox import HPacker, VPacker, TextArea, DrawingArea
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/offsetbox.py", line 33, in <module>
    from matplotlib.image import BboxImage
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/image.py", line 19, in <module>
    from matplotlib.backend_bases import FigureCanvasBase
  File "/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/backend_bases.py", line 46, in <module>
    from matplotlib import (
ImportError: cannot import name 'get_backend' from partially initialized module 'matplotlib' (most likely due to a circular import) (/home/lemon/code/python/analysis_engine/.venv/lib/python3.9/site-packages/matplotlib/__init__.py)
yulqen commented 3 years ago

If you want to share what your intentions are for your CLI interface, I'm happy to help you pull it together, including doing the packaging for PyPI, etc.

EDIT2: I am making progress understanding what you're after here...

EDIT: I see your module comment at the top! (by the way, I think you mean "CLI" rather than "cml". CLI is commonly usaged to mean "command line interface" and that's what we're doing here.

Looks like you want vfm to be your basic command and have -vfm, -stage, -group, and -quarters as your arguments. Tip: arguments in argparse are usually of the form stage rather than -stage (although there are environments where the single dash is used. Go programmers tend to use the single dash actually - not so much in python.). So, for example, you'd have vfm run stage and vfm run quarters. Is that right? If that is the case, you don't actually need run. You would have, say, vfm stage or whatever.

To be honest, I've not used argparse for a couple of years so I can't remember. If I could get this import error fixed I could start to have a look. Something to do with matplotlib?

banillie commented 3 years ago

Yes I think there is a problem with the matplotlib versioning. It works fine on my computer, but I think it causes problems on windows.

Thanks for all the comments. Will look at in detail tonight.

yulqen commented 3 years ago

Upgraded to matplotlib v3.3.0 and it seems to be fine.

banillie commented 3 years ago

Great. Will try that version on my machine and see what happens.

banillie commented 3 years ago

3.3.0 works fine. There are a few, matplotlib related, areas of the code which now crash, but to be honest not entirely sure what they did and if I hash them out code seems to run fine.

I was aware of this versioning issue, so its good to tackle it now. Have updated the requirements.txt

banillie commented 3 years ago

btw the arguments would ideally be something like

analysis_engine vfm -quarters "Q2 20/21" "Q2 19/20" -stage "FBC" or analysis engine vfm -group "Rail"

much like datamaps has datamaps import templates or datamaps export masters -d [PATH]

yulqen commented 3 years ago

Gotcha. You don't have to adhere to it, but just to note that "options" on the command line are most commonly of the form --quarters, --group, --stage. When you use the full word, you use --quarters - the double dash. -q would be the equivalent - it's the short form. The user can choose which to use. See in the docs here the add_argument() function allows you to specify both forms.

Datamaps actually uses subcommands. In the example you give, datamaps is the program. import is the command. templates is the subcommand. I don't think you'll need subcommands yet but you can do them with argparse if necessary.

In the case of datamaps export masters -d - datamaps is the program. export is the command. masters is the subcommand. -d is the option, and --datamap is also legal here.

banillie commented 3 years ago

Yes so I think it would be best to have it like this. analysis engine the program vfm the command . And then options e.g. -q or --quarters another example would be something like analysis engine the program project reports the command. This would produce a different output to vfm. Yes don't think I'll need sub commands.

yulqen commented 3 years ago

Yep. project reports would have to be project-reports (dash rather than underscore). I don't think you can use multiple words.

yulqen commented 3 years ago

I also recommend your program just be engine or ae rather than analysis_engine which is a bit long. You could create an alias in bash to do the same but you can easily package your python code to do it.

banillie commented 3 years ago

Hi Matt, let me know if you might have sometime for a coding session today. I'm only available until 2pm today. Would be great to go through the above. If you can't this week no worries though. Cheers.

banillie commented 3 years ago

@yulqen We've come a long way on this issue! Thanks for all your help and guidance. I uploaded onto PyPI early the analysis_engine package. I left a few messages on Teams for you (in case you didn't see them). Now thinking about how to best install and operate the a_e package onto my team's computers, which are of course windows. Do you think it's best for them to download another shell? Or stick with the basic cmd? They all have Python already installed so it should be a case of checking the Python version and then pip installing via python3 -m pip install analysis_engine and from there it will be analysis.exe subcommand. Is that right?

yulqen commented 3 years ago

Congratulations! I installed it before I saw this message.

At some point soon you'll need to handle this exception, which happens whenever I try to call any command such as analysis risks:

Traceback (most recent call last):
  File "/tmp/toss/bin/analysis", line 8, in <module>
    sys.exit(main())
  File "/tmp/toss/lib/python3.9/site-packages/analysis_engine/main.py", line 208, in main
    args.func(vars(args))
  File "/tmp/toss/lib/python3.9/site-packages/analysis_engine/main.py", line 48, in vfm
    m = Master(get_master_data(), get_project_information())
  File "/tmp/toss/lib/python3.9/site-packages/analysis_engine/data.py", line 55, in get_master_data
    project_data_from_master(root_path / "core_data/master_3_2020.xlsx", 3, 2020),
  File "/tmp/toss/lib/python3.9/site-packages/datamaps/api/api.py", line 14, in project_data_from_master_api
    m = Master(Quarter(quarter, year), master_file)
  File "/tmp/toss/lib/python3.9/site-packages/datamaps/plugins/dft/master.py", line 120, in __init__
    self._data = project_data_from_master(self.path)
  File "/tmp/toss/lib/python3.9/site-packages/datamaps/plugins/dft/portfolio.py", line 12, in project_data_from_master
    wb = load_workbook(master_file)
  File "/tmp/toss/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 315, in load_workbook
    reader = ExcelReader(filename, read_only, keep_vba,
  File "/tmp/toss/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 124, in __init__
    self.archive = _validate_archive(fn)
  File "/tmp/toss/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 96, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "/usr/lib/python3.9/zipfile.py", line 1239, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/home/lemon/Documents/analysis_engine/core_data/master_3_2020.xlsx'

Now that your code is publically available, you should be the effort to tidy up bugs like this and give helpful messages. I know you only intend to use it internally, but you never know! Plus your team will be using it. Take some time to improve the help documentation too - all can be added to --help.

I haven't seen anything on Teams?

The shell is the name of the program that runs the command line (in our case no Linux we use bash as the shell). For your Windows use, it's fine to use cmd.exe. There is another shell on Windows called PowerShell, but they probably don't need to use. Yes, you might need to help them set the PATH variable, that can be a pain in Windows, but apart from that, you're correct: pip install analysis_engine should do it.

Great stuff Will! Keep in touch with how it's going.

yulqen commented 3 years ago

By the way, you should also add a license if you haven't already.

banillie commented 3 years ago

Thanks Matt. Yes I haven't given to you the latest master.wb and it's absence if causing the crash. I'll drop it into Teams now. How the master data is uploaded into the programme is something I need to look at and modify because at the moment it's hard coded, and it can't be that way going forward. But yes there will definitely need to be checks and messaging happening at the upload point... related to that is the next step which will be to keep the master data in a database or perhaps as an interim saved in pickles?

Re PATH is this the path that python uses to run? the same issue as the option that is provided when python is downloaded, but not included as default for some reason?

Yes have the licence is in there.

Cheers.

yulqen commented 3 years ago

The PATH to where pip installs packages also needs to be added. The user should get a message about this the first time they install a package using pip. Without fixing this, they will have to run something like C:\Users\whoever\AppData\Roaming\Local\Programs\PipPackages\asdokoKo3090\analysis.exe ... instead of analysis ....

https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/

banillie commented 3 years ago

@yulqen I'm going to close this issue. Joe and Robert successful install via pip install both analysis_engine and datamaps onto their machines this morning. Annie should hopefully be able to do it also later today. We didn't encounter any problems with the PATH to where pip installs thankfully.

I also published up a newer package v0.0.2 of analysis_engine this morning. If you want to try it out you need to create a new folder analysis_engine/core_data/pickle . And the command analysis initiate has to be done first. analysis_engine now stores its Master class data structure into a pickle file, and access data that way, rather than taking the data out of the excel master files each time. See what you think.

Cheers. :rocket: