demiangomez / Parallel.GAMIT

Python wrapper to parallelize GAMIT executions
BSD 3-Clause "New" or "Revised" License
37 stars 20 forks source link

Plotting for station network [Review / Merge] #90

Closed espg closed 1 month ago

espg commented 2 months ago

This PR adds plotting functionality for visualizing gps station segmentation into subnetworks globally. Plotting for individual subnetworks will be added as a separate pull request.

Example of the function output:

testing

Some notes / caveats for this PR:

  1. The plotting functionality in the PR introduces new installation dependencies to enable the plots:
    • networkx for network connectivity
    • basemap (from mpl_toolkits) for plotting on maps and in projected spaces
    • matplotlib , to support basemap above
    • pyproj to convert from the ECEF coordinates to lat / lon values for plotting on the maps
      1. Because this plots the global distribution of networks, this is run from pyNetwork module-- not from pyGamitSession. This means that the plots are not distributed and run in parallel, but are instead run sequentially during the clustering segmentation. The reason for this is that plotting the global plots requires all of the subnetworks, and pyNetwork only sends the subnetwork information to pyGamitSession; hence, these plots are created from the pyNetwork module where the station clustering and labels for the entire globe are present.
      2. Run time appears to be roughly 2-3 seconds per day of station data processed. This number is approximate; benchmarking plots is difficult because when debugging plots within jupyter notebooks, there is additional latency added for rendering within the browser... hence using %%time within the notebook gives a higher time estimate of 7-8 seconds per plot. Furthermore, these plot will render in jupyter even when only requesting a file output. The time module is used to time plot time, but also excludes the time required to write out the image to disk and format the png.
      3. The reference plot above can be modified as needed-- do we want any of the following:
        • a histogram of the subnetwork sizes?
        • text information (title, subtitles, station counts, etc.)?
        • higher DPI?
        • larger/smaller image output? (right now image size / ration is 8 by 12 which approximates 8.5 by 11 printer paper)
demiangomez commented 2 months ago

Is pyproj really necessary? We have a function in utils that converts from ECEF to lla, see https://github.com/demiangomez/Parallel.GAMIT/blob/a2f9bedf74e37321678c54ff4c5b1bb448405f21/classes/Utils.py#L385, which should be more than enough for this application. Adding another dependency for this seems an overshoot, unless there is something else we are using this for.

espg commented 2 months ago

@demiangomez we can use parallelGamit.classes.Utils.ecef2lla for this... it's actually probably preferable since the prproj method is depreciated and will stop working at some point in the future.

I had to modify the ecef2lla to take arrays of coordinates; as originally written it only takes single values of ECEF. This is a very simple fix and shouldn't break anything... I've added it as a commit within this PR

eckendrick commented 2 months ago

I'm getting this error message when running pyParallelGamit.py

espg commented 2 months ago

...this is easy to fix incorrectly, but a fair bit harder to fix the right way.

The easy fix is to just copy the ecef2lla function from classes.Utils into plots.py

The harder fix is to quite a bit harder. It would probably involve changing the directory structure-- i.e., renaming parallel_gamit to pgamit as discussed today, and then moving the classes directory under pgamit. Right now the scripts are being run directly-- that is, there's no import of the 'parallel_gamit' library itself. This is why the import is failing, because it doesn't know the package name or the folder hierarchy. Earlier versions of python (pre 3.6) would let you get away with these relative imports without parameters like name or package set, but newer versions enforce having a sane pep8 compliant layout for these kind of imports.

Some of the mapping needed is what's required for pip installation, so it's fine to do since we were planning on doing it anyway... but when pip does an install, it doesn't install the full image / copy of the git repo. If you look at the sklearn repo, the only thing that's installed when you install the python package is this single directory named sklearn, which is unpacked inside of site-packages for a given python version. Each folder, starting with the base folder, has an __init__.py, which maps the directory structure of the entire software package.

We don't have any of that. Our top level directory has python code that's used inside of classes, com, and parallel_gamit (at least), and no __init__.py structure to tell python what's in each module and submodule.

The easiest thing to do is probably to create a top level python package directory called pgamit and then move each of the classes, com, and parallel_gamit folders into that directory. That will at least keep the existing paths in the same relative hierarchy as they are currently. Also, if git cloning the full repo, we can drop down to that directory and run previous scripts there, and have similar behavior... but it still may break some things, because it's still changing path locations.

@demiangomez the thing to probably do for this is to setup the reorganization pre- the clustering commits, pin that as a prior version to test against (1.1.24), and then apply the new commits to the next version bump (1.2.1). I'll see if I can't get a testing branch that's working on this early next week. If we need production capabilities sooner, we can do the quick / easy fix before then to get things running, and I can sandbox and test the better solution in parallel to replace the quick fix when it's ready for testing.

demiangomez commented 2 months ago

@espg just change the ecef2lla import as I suggested in the comment. This will fix the issue for the time being.

espg commented 2 months ago

@demiangomez I brought the ecef2lla function into the plot code to get this working for now.

We can't directly import ecef2lla with the current directory structure into any scripts in the parallel_gamit directory. We can import things within the same directory, i.e., import anything from within the parallel_gamit directory from one file to another. We can also do the same within the classes directory. But we can't import functions from classes into parallel_gamit or vice versa until the module/package name is fixed and there is a base directory defined. "Parallel.GAMIT" is the project directory, but not the base python directory, and we'll need to define a single base python directory under "Parallel.GAMIT" where 'classes', 'com', 'parellel_gamit', etc. live.

That said, the last commit will fix the current issue in the meantime, and can be merged so that we can setup production runs with the new plots.

demiangomez commented 2 months ago

I don't understand why you can't just import Utils if classes is in the PYTHONPATH. Did this change with the newer version of Python? I don't think so because otherwise ALL of PGAMIT would have stopped working. Copying the function into plot is fine, but again, just importing Utils also works because how the environment is configured in the cluster.