PIC-IRIS / PH5

Library of PH5 clients, apis, and utilities
Other
15 stars 9 forks source link

PH5 Repo / Documentation Needs Work #63

Closed nick-falco closed 3 years ago

nick-falco commented 7 years ago

The working group made a good point that an arbitrary user should be able to install and use the various PH5 APIs and utility code. Currently there is not enough documentation for them to realistically do this.

Below are a few ideas for improving the repo. Please let me know what you think.

Feel free to add to this list. I think several of these items could be done fairly easily and would make PH5 much more appealing/useful to outsiders.

derick-hess commented 7 years ago

That's a good idea. I actually now have an install script for ubuntu and centos that checks/installs all os dependencies and python dependencies then grabs the latest PH5 from git hub and installs it. Better installation instructions from scratch can also be added (I still need to modify it to eventually have the option of creating a virtual environment if the user wants)

reordering everything into directories is definitely needed. I see a few things that can be removed but almost all of it is needed to fully interact with PH5 in various ways.

I think the most logical way to do it may be:

We have 2 documents our data group is currently working on that will be added to the wiki:

A while back I wrote a basic tutorial on using ph5API.py to write your own code to interact with an archive. I will update this as well as eventually write more in depth tutorials on use of ph5API.

Nick If you want you can start writing a tutorial on using the command line API tools and Steve and or I can also add to it.

The more in depth documentation we have the better.

rsdeazevedo commented 7 years ago

Also, there was an e-mail from David Okaya that i will forward to Derick.

rsdeazevedo commented 7 years ago

FYI, hi all,

toward the end of our recent working group call, John brought up the ASDF seismic file format that is also based on HDF5. An actual journal article on ASDF came out last year in GJI (a scientific journal). See attached. ASDF also has a web site [ https://seismic-data.org ]; this group is proactive in publicizing what they have created.

When I was reading this article, it brought up two points for PH5.

(1) This GJI article describes ASDF at a higher level and is not a technical document (i.e., the forest, not the trees); it is aimed at potential users of ASDF. That is, users representing their own internal group or making decisions for a broader community of users.
At some future point, PH5 needs a similar type of document that is openly available to people who want to assess if they should use it. This PH5 document needs to describe what it is and what is its intended usage.

(2) Describing what PH5 is designed for requires us to understand what is the main purpose of PH5. It is primarily a behind-the-scene archival format that PASSCAL-DMC use internally but the user community will always be shielded from due to extraction services by DMC? Or is part of its design objective is for users to obtain and manipulate data directly from PH5 files? Something inbetween?
The ASDF article point out that it could be used for archive, data exchange, or for other data types beyond seismic data, although the authors emphasize usage by the analysis community. PH5 has been anecdotally described as being similarly broadly applicable. If the functionalities of ASDF and PH5 appear similar, this is another reason to articulate the main purpose of PH5. If they accomplish the same thing, do we need both? Is the purpose of PH5 sufficiently different that they are not in competition? If PH5 is primarily for internal IRIS use, the existence of ASDF is not an issue. If PH5 is intended to be pushed toward the user community, this brings up a duplication-of-effort issue.

-David PS. by the way, the astronomy community has created their own HDF5-based data exchange format, which they also named ASDF.

nick-falco commented 7 years ago

@derick-hess and @rsdeazevedo please see my start on outlining a proposed repo structure below. We should work on mapping out where every file in the repo will belong. I'm also a bit unsure how the install will work once the files are in different directories?

/ <-- root
/README.md
/LICENSE
/CHANGELOG.txt <-- for  tracking changes between versions
/environment.yml
/install.py
/runtests.py
/.travis.yml
/.gitignore
.../ph5/... <-- I suggest changing kitchen to ph5
   .../  ------ <-- globally used resources would go under /ph5
   .../utilities/ <-- utilities for modifying an existing ph5 experiment
   .../core/  <-- core tools for creating a ph5 experiment
   .../clients/ <-- clients for viewing and interacting with a ph5 experiment
                  .../apis/ <-- APIs for extracting data including dependencies
                      .../ph5API.py
                      .../ph5tomsAPI.py
                      .../ph5tostationxml.py
                      .../ph5toexml.py
                      .../ph5toseg.py
                      .../ph5utils.py
                      .../test_ph5tomsAPI.py
                      .../test_ph5utils.py
                      -- Also include dependencies here:
                      .../TimeDOY.py
                      .../columns.py
                      .../cs2cs.py
                      .../Experiment.py
                      .../decimate.py
                      .../SEGYFactory.py
                      .../ etc.
                  .../PH5View/ <--- ph5 viewer source code
                      .../PH5ReaderwVispyAPI.py
                      .../PH5Viewer.cfg
                      .../PH5ViewerwVispyAPI.py

A few notes on the install script (https://github.com/PIC-IRIS/PH5/blob/master/kitchen/install.py):

rsdeazevedo commented 7 years ago

Maybe we should move to a more standard install using distutils? The c modules already use distutils to compile.

On 6/15/2017 5:02 PM, Nick Falco wrote:

@derick-hess https://github.com/derick-hess and @rsdeazevedo https://github.com/rsdeazevedo please see my start on outlining a proposed repo structure below. We should work on mapping out where every file in the repo will belong. I'm also a bit unsure how the install will work once the files are in different directories?

|/ <-- root /README.md /LICENSE /CHANGELOG.txt <-- for tracking changes between versions /environment.yml /install.py /runtests.py /.travis.yml /.gitignore .../ph5/... <-- I suggest changing kitchen to ph5 .../ ------ <-- globally used resources would go under /ph5 .../utilities/ <-- utilities for modifying an existing ph5 experiment .../core/ <-- core tools for creating a ph5 experiment .../clients/ <-- clients for viewing and interacting with a ph5 experiment .../apis/ <-- APIs for extracting data including dependencies .../ph5API.py .../ph5tomsAPI.py .../ph5tostationxml.py .../ph5toexml.py .../ph5toseg.py .../ph5utils.py .../test_ph5tomsAPI.py .../test_ph5utils.py -- Also include dependencies here: .../TimeDOY.py .../columns.py .../cs2cs.py .../Experiment.py .../decimate.py .../SEGYFactory.py .../ etc. .../PH5View/ <--- ph5 viewer source code .../PH5ReaderwVispyAPI.py .../PH5Viewer.cfg .../PH5ViewerwVispyAPI.py |

  • /ph5/utilities/ and /ph5/core/ would probably need to be broken down even further.
  • Even though there is only one program for viewing PH5, I still think this should be in it's own directory under clients.
  • @derick-hess https://github.com/derick-hess Could ph5tosac.py and dumpsac.py be removed as they are replaced by ph5tomsAPI.py?

A few notes on the install script (https://github.com/PIC-IRIS/PH5/blob/master/kitchen/install.py):

  • I don't think that it installs the Python dependencies. I created the environment.yml file for installing the Python dependencies with Anaconda. If the goal is to have the install.py script also install the Python dependencies then it should reference this file.
  • I'm not sure if the script handles some of the dependencies. Dependencies ported from C code like (fir.h, firfilt_py.c, and firfiltwrapper_py.c, ibm2ieee_py.c, ibm2ieeewrapper_py.c, etc) need to be built custom.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PIC-IRIS/PH5/issues/63#issuecomment-308888632, or mute the thread https://github.com/notifications/unsubscribe-auth/AUVZHG3sEOmCNh-rrcJWOBTDQQfkHQ-tks5sEbgVgaJpZM4N7wMF.

nick-falco commented 7 years ago

Maybe we should move to a more standard install using distutils? The c modules already use distutils to compile.

@rsdeazevedo This is a good idea. I see no point in reinventing the wheel here.

What do you think of the proposed repo layout?

You know the PH5 software suite much better than I do, so your guidance will be critical to better organizing the repo.

derick-hess commented 7 years ago

Steve is going over the directory structure today and looking at all the files to determine the best layout.

I've already removed ph5tosac but dumpSAC should probably stay. dumpsac is used for checking header information and can be pretty useful.

I'll check with our data group on their progress with the short and long PH5 documentation and as soon I'll load it into the wiki.

rsdeazevedo commented 7 years ago

My comments below... GitHub-reorg-2017.167.a.docx

nick-falco commented 7 years ago

Thanks Steve. I like your redesign.

Below is the directory structure based on feedback from @rsdeazevedo

/ <-- root
/README.md
/LICENSE
/CHANGELOG.txt <-- for  tracking changes between versions
/environment.yml
/install.py
/runtests.py
/.travis.yml
/.gitignore
.../ph5/... <-- I suggest changing kitchen to ph5
   .../  ------ <-- globally used resources would go under /ph5
   .../utilities/ <-- utilities for modifying or creating an existing ph5 experiment.
                      (xxxxtoph5 programs, pforma, kefedit, noven etc.)
   .../core/  <-- I would put the library like things here: 
                  (Experiment.py, columns.py, TimeDOY.py, ph5API.py, ph5utils.py, etc...)
   .../clients/ <-- clients for viewing and interacting with a ph5 experiment
                  .../ph5tomsAPI.py
                  .../ph5tostationxml.py
                  .../ph5toexml.py
                  .../ph5toseg.py
                  .../ph5utils.py
                  .../test_ph5tomsAPI.py
                  .../ etc.
                  .../PH5View/ <--- ph5 viewer source code (Put all GUI programs in subdirectories)
                      .../PH5ReaderwVispyAPI.py
                      .../PH5Viewer.cfg
                      .../PH5ViewerwVispyAPI.py
nick-falco commented 7 years ago

- Since ph5tomsAPI.py, ph5tostationxml.py, and ph5toexml.py use the modules in /core how will importing these modules work? Maybe the core modules should be in a directory under the APIs. For example:

/
/....
/ph5/
~~/ph5/clients/~~
/ph5/.... <--- instead all clients (apis) are at the /ph5/ level:
(ph5tomsAPI.py, ph5tostationxml.py, etc.)
.../guis/... <--- all GUI programs in separate directories here
.../core/  <-- I would put the library like things here: 
(Experiment.py, columns.py, TimeDOY.py, ph5API.py, ph5utils.py, etc...)
.../utilities/ <-- utilities for modifying or creating an existing ph5 experiment.
(xxxxtoph5 programs, pforma, kefedit, noven etc.)

Actually on second thought I could just change the paths in the Web Services to reflect the new layout. The originally proposed design would be fine. I removed this from the previous comment.

nick-falco commented 7 years ago

@derick-hess If you like Steve's repo design I think we should go with that.

derick-hess commented 7 years ago

Yeah, it makes sense to me and I think Steve has the best understanding of how all the pieces work. I'll get on implementing the structure change. I'll commit that this afternoon.

rsdeazevedo commented 7 years ago

Nick;

Wouldn't it just be import ph5.clients.ph5tomsAPI if the init.py files are there?

steve.

On 06/16/2017 10:58 AM, Nick Falco wrote:

  * Since ph5tomsAPI.py, ph5tostationxml.py, and ph5toexml.py use
    the modules in /core how will importing these modules work?
    Maybe the core modules should be in a directory under the
    APIs. For example:

|/ /.... /ph5/ /ph5/clients/ /ph5/.... <--- instead all clients (apis) are at the /ph5/ level: (ph5tomsAPI.py, ph5tostationxml.py, etc.) .../guis/... <--- all GUI programs in separate directories here .../core/ <-- I would put the library like things here: (Experiment.py, columns.py, TimeDOY.py, ph5API.py, ph5utils.py, etc...) .../utilities/ <-- utilities for modifying or creating an existing ph5 experiment. (xxxxtoph5 programs, pforma, kefedit, noven etc.) |

Actually on second thought I could just change the paths in the Web Services to reflect the new layout. The originally proposed design would be fine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PIC-IRIS/PH5/issues/63#issuecomment-309079105, or mute the thread https://github.com/notifications/unsubscribe-auth/AUVZHEZyYGQWw6Pe07v0TAiI5Cxx70aRks5sErQogaJpZM4N7wMF.

nick-falco commented 7 years ago

It would be from ph5.clients import ph5tomsAPI.py. My one concern is that the ph5/ directory would not be in my python path for the web services. This would cause things like importing ph5utils.py from /ph5/core/ to fail when I copy the code into the web services.

i.e. In ph5tomsAPI.py you would have from ph5.core import ph5utils.py but ph5/ would not be in the python path for the web services.

nick-falco commented 7 years ago

It would be best to package the ph5 project like other python libraries (like obspy). This way it can be installed in a standardized way.

I strongly suggest reading the following: https://python-packaging.readthedocs.io/en/latest/ with special note taken to https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html

rsdeazevedo commented 7 years ago

Yes, this is a must read. Thanks.

nick-falco commented 7 years ago

From what I've read, it looks like setuptools is the preferred python packaging tool. https://setuptools.readthedocs.io/en/latest/setuptools.html

Do you guys have backups of the files we deleted from the Git Repo? Once they're backed up, I think we could update the refactor branch and overwrite master. At that point we can start working on packaging the project on different branch.

rsdeazevedo commented 7 years ago

There are setup.py scripts to handle compiling the modules written in c. The filenames all start with 'su' and end in '_py.py'.

nick-falco commented 7 years ago

Thanks Steve. I'm working on adding these now. To keep these a little better organized, I'm going to move these files along with the C code to a new sub directory of ph5/core/ called c_dependencies.

I think the setup.py script will then just need to import he subcd_py.py, sufirfilt_py.py, suibm2ieee_py.py, surt_130_py.py, and surt_125a_py.py when it is run for these to install.

nick-falco commented 7 years ago

I've added the following page to the PH5 Wiki outlining how to request data using the PH5 Web Services.

https://github.com/PIC-IRIS/PH5/wiki/PH5-Web-Services

nick-falco commented 7 years ago

I added a custom side bar navigation to the wiki. The automatically generated side bar was confusing.

nick-falco commented 7 years ago

There should be a page in the wiki that has a list of all command line tools along with a description of what they are used for. We can't possibly expect someone to be able to use PH5 without knowing what any of the scripts do.

derick-hess commented 7 years ago

Bruce actually just made a document with this information in it. It has a quick description of most of the files and what they do. I'll copy that info into a wiki page and maybe expand on it a bit

nick-falco commented 7 years ago

Another thing I noticed was that the system requirements are pretty restrictive (CentOS only):

https://github.com/PIC-IRIS/PH5/wiki/PH5-Requirements#system-requirements

I'm not sure if all of these are actually requirements. I can run the PH5 code on my Mac, and I don't think I installed all of these. We should see if we can remove any items for this list.

If these really all are dependencies then we should provide instructions for other operating systems.

derick-hess commented 7 years ago

Most of these are required for creating PH5 archives. They all install fine on ubuntu as well,w ith the exception of webkit. We need to get Lan to rewrite ph5viewer not using webkit since it is not considered safe and most linux distributions have removed it from their package managers.

Anaconda also comes with most of the libraries compiled as well. I'm fairly certain atlas, hdf5 and geos are included with anaconda.

nick-falco commented 7 years ago

I'll do some reading about the best way to present these dependencies to users. Having a list of yum install <package-name> commands doesn't seem like the best way to do this.

derick-hess commented 7 years ago

Added the tables from the working group document to the wiki

nick-falco commented 6 years ago

Are there finalized versions of the PH5 in a Nut Shell and Long documents? We should put these on the GitHub page before AGU.

I'm also creating a page on the DS website that provides instructions for submitting PH5 experiments to the DMC.

rsdeazevedo commented 6 years ago

Nick, Alissa is working on this and plans on having it finished before AGU.

On 11/17/2017 10:41 AM, Nick Falco wrote:

Are there finalized versions of the PH5 in a Nut Shell and Long documents? We should put these on the GitHub page before AGU.

I'm also creating a page on the DS website that provides instructions for submitting PH5 experiments to the DMC.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PIC-IRIS/PH5/issues/63#issuecomment-345312685, or mute the thread https://github.com/notifications/unsubscribe-auth/AUVZHPZfSznPUgYej_Nnqj7wU1D5M8EZks5s3cVSgaJpZM4N7wMF.

ascire-pic commented 3 years ago

Closing this issue due to beginning development of GeoHDF