a-renzini / pygwb

MIT License
3 stars 3 forks source link

[JOSS review] Improve documentation #1

Closed Sbozzolo closed 9 months ago

Sbozzolo commented 1 year ago

(Main review for JOSS happening here)

The current documentation is very very minimal and needs to be expanded to describe capabilities and usage.

More specifically, here are some of the issues I've seen:

This list is not comprehensive and I would recommend designing the documentation from scratch.

Here is a good resource for a possible way to write effective documentation: https://documentation.divio.com/

a-renzini commented 1 year ago

Thank you! We are hard at work addressing this issue. Could this be turned into a checklist by any chance?

cmbiwer commented 1 year ago

I went through before looking at @Sbozzolo review. I wrote down basically the same issues and agree with @Sbozzolo . There is a lot missing from the documentation/tutorial and things currently point to URLs that require LIGO credentials.

I'll add a few more things though.

I installed pytest myself (see the comments above, its not there) and tried the test suite. I see two test fail; TestInterferometer.test_set_timeseries_from_channel_name and TestInterferometer.test_from_parameters. The message (truncated):

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

fobj = '/var/folders/7b/9dh6yr6n6d9dvwlc33g1_3rw0000gp/T/astropy-download-92331-u335cim9', args = (), kwargs = {'end': LIGOTimeGPS(1247644588, 0), 'start': LIGOTimeGPS(1247644169, 0), 'tag': 'C00'}
h5f = <Closed HDF5 file>

    @wraps(func)
    def decorated_func(fobj, *args, **kwargs):
        # pylint: disable=missing-docstring
        if not isinstance(fobj, h5py.HLObject):
            if isinstance(fobj, FILE_LIKE):
                fobj = fobj.name
            with h5py.File(fobj, 'r') as h5f:
>               return func(h5f, *args, **kwargs)
E               TypeError: Failed to read GWOSC data from 'https://gwosc.org/archive/data/O3a_16KHZ_R1/1246756864/H-H1_GWOSC_O3a_16KHZ_R1-1247641600-4096.hdf5': read_gwosc_hdf5() got an unexpected keyword argument 'tag'

../../../opt/anaconda3/envs/pygwb/lib/python3.8/site-packages/gwpy/io/hdf5.py:59: TypeError

Aside from the issues already mentioned. It says to do:

conda install .

From the dir above the repo.

That didn't work either there. And even cd into in the repo, it didn't work. Conda couldn't find the package; eg.:

$ conda install .
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - .

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

I ended up using python -m pip install . to try using the package.

It may just be my Anaconda version. But its not stated anywhere what version is required.

cmbiwer commented 1 year ago

One more with the paper itself. The state of the field is a requirement in the paper. But there wasn't really any mention of the state of the field (eg. LALStochastic) and other packages, and what separates pygwb from those packages.

a-renzini commented 1 year ago

@Sbozzolo and @cmbiwer : thanks for your feedback! Several of the issues you point out are easy to resolve and we will update the documentation accordingly; there are a couple though that I am unsure how to address so let me post questions here.

a-renzini commented 1 year ago

I went through before looking at @Sbozzolo review. I wrote down basically the same issues and agree with @Sbozzolo . There is a lot missing from the documentation/tutorial and things currently point to URLs that require LIGO credentials.

I'll add a few more things though.

  • [ ] Test suite seems to fail.

I installed pytest myself (see the comments above, its not there) and tried the test suite. I see two test fail; TestInterferometer.test_set_timeseries_from_channel_name and TestInterferometer.test_from_parameters. The message (truncated):

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

fobj = '/var/folders/7b/9dh6yr6n6d9dvwlc33g1_3rw0000gp/T/astropy-download-92331-u335cim9', args = (), kwargs = {'end': LIGOTimeGPS(1247644588, 0), 'start': LIGOTimeGPS(1247644169, 0), 'tag': 'C00'}
h5f = <Closed HDF5 file>

    @wraps(func)
    def decorated_func(fobj, *args, **kwargs):
        # pylint: disable=missing-docstring
        if not isinstance(fobj, h5py.HLObject):
            if isinstance(fobj, FILE_LIKE):
                fobj = fobj.name
            with h5py.File(fobj, 'r') as h5f:
>               return func(h5f, *args, **kwargs)
E               TypeError: Failed to read GWOSC data from 'https://gwosc.org/archive/data/O3a_16KHZ_R1/1246756864/H-H1_GWOSC_O3a_16KHZ_R1-1247641600-4096.hdf5': read_gwosc_hdf5() got an unexpected keyword argument 'tag'

../../../opt/anaconda3/envs/pygwb/lib/python3.8/site-packages/gwpy/io/hdf5.py:59: TypeError
  • [ ] Installation instructions seem incomplete.

Aside from the issues already mentioned. It says to do:

conda install .

From the dir above the repo.

That didn't work either there. And even cd into in the repo, it didn't work. Conda couldn't find the package; eg.:

$ conda install .
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - .

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

I ended up using python -m pip install . to try using the package.

It may just be my Anaconda version. But its not stated anywhere what version is required.

The failure of the test suite here seems to be due to a gwpy issue which (at least to my knowledge) was solved a while ago. Could you upload the full logs so I know what version you are running, etc?

Also, we run the testing suite in the pygwb gitlab repo, so that is why the CI badges are broken in the github version. Should I duplicate the testing also in github? It seems like a bit of a waste of computational resources. The gitlab repo is now public so you can see the test coverage here: https://git.ligo.org/pygwb/pygwb

a-renzini commented 1 year ago

@Sbozzolo the reason there are pickle files as a part of the repository is because some of our tests depend on them; they are not part of the codebase itself (when packaging the code for distribution the test suite is omitted). Our objects are designed as data containers, at different stages of a GWB analysis, and the test suite is based on pre-compiled pickled versions of these objects. We have found this an effective way of keeping track of changes in the modules and objects, without having the test suite re-run the analysis every time.

a-renzini commented 1 year ago

Now addressed here: 07926e35901c09130752054cab414bc18a352a0c

a-renzini commented 1 year ago

Now addressed here: 7ea65ea8621ae952e368bd8b2dd81e88abeefca6

a-renzini commented 1 year ago

@cmbiwer re:

I tracked this issue back and it is due to a backwards-incompatible change in the gwosc package; this was fixed in pygwb v.1.1, but the version you see here is the one associated to the ApJ paper, and thus does not have this change. I have gone ahead and updated (and cleaned up) the code here, removing the tag argument which was deprecated by gwosc. We will merge these changes into pygwb once the review is over. The documentation should now run.

See 213fa0cd7bbba3b2acc682977cd043209ca1a01d

(I could also have fixed the version of gwosc to an older one, to make the old v1.0.0 work; however, as I expect the changes we make here will eventually be merged with the current version of the code, I thought it would be more useful to update the code this way. I am also happy to go the other way.)

cmbiwer commented 1 year ago

The failure of the test suite here seems to be due to a gwpy issue which (at least to my knowledge) was solved a while ago. Could you upload the full logs so I know what version you are running, etc?

This is the version it installed for me:

(pygwb) pn1618376:pygwb cmbiwer$ python
Python 3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:05:36) 
[Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygwb
>>> pygwb.__version__
'1.0.1.dev4'
a-renzini commented 1 year ago

The failure of the test suite here seems to be due to a gwpy issue which (at least to my knowledge) was solved a while ago. Could you upload the full logs so I know what version you are running, etc?

This is the version it installed for me:

(pygwb) pn1618376:pygwb cmbiwer$ python
Python 3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:05:36) 
[Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygwb
>>> pygwb.__version__
'1.0.1.dev4'

sorry - I meant gwpy version; in any case, I addressed this issue now in 213fa0cd7bbba3b2acc682977cd043209ca1a01d as mentioned above - please let me know whether the test suite now works with these changes :)

(note that the issue was coming from the fact that gwpy depends on gwosc, and although the version of gwpy installed by pygwb v1.0.0 is correct, gwpy doesn't fix the version of gwosc it needs and the break was coming from updates in gwosc)

Sbozzolo commented 1 year ago

Thank you! We are hard at work addressing this issue. Could this be turned into a checklist by any chance?

Sorry for the delay. I changed the list into a checklist. In this, let me remark again that this list summarizes my first impressions, so I'd encourage you to also think about the documentation beyond these points.

@Sbozzolo the reason there are pickle files as a part of the repository is because some of our tests depend on them; they are not part of the codebase itself (when packaging the code for distribution the test suite is omitted). Our objects are designed as data containers, at different stages of a GWB analysis, and the test suite is based on pre-compiled pickled versions of these objects. We have found this an effective way of keeping track of changes in the modules and objects, without having the test suite re-run the analysis every time.

I was under the impression that pickle files were not portable across versions/OSs, but it turns out that they are, as long as the same protocol is being used. Nonetheless, I think that sharing pickles is not the best idea. As the documentation clearly remarks, pickles are not safe. Moreover, they are opaque and can only be read by Python. Finally, if you change the object in your code in a future version (or if an upstream package changes its code), the pickle might stop working. If it is possible and not too difficult, I'd suggest moving to something like JSON. Also, if used only for tests, the file should not be at the root level of your repository.

Now addressed here: https://github.com/a-renzini/pygwb/commit/07926e35901c09130752054cab414bc18a352a0c

What branch should I look at when reviewing the code/documentation?

a-renzini commented 1 year ago

Sorry for the delay. I changed the list into a checklist. In this, let me remark again that this list summarizes my first impressions, so I'd encourage you to also think about the documentation beyond these points.

Thank you - yes I understand the documentation is very lacking, we are working hard on it. Apologies it is taking quite some time, we will keep updating here as we go along!

I was under the impression that pickle files were not portable across versions/OSs, but it turns out that they are, as long as the same protocol is being used. Nonetheless, I think that sharing pickles is not the best idea. As the documentation clearly remarks, pickles are not safe. Moreover, they are opaque and can only be read by Python. Finally, if you change the object in your code in a future version (or if an upstream package changes its code), the pickle might stop working. If it is possible and not too difficult, I'd suggest moving to something like JSON. Also, if used only for tests, the file should not be at the root level of your repository.

Apologies - I don't quite understand what you mean by "the file should not be at the root level of your repository". Should these be in a separate branch? Every time we update a part of the code, indeed the pickles "stop working", and we update them accordingly. We've found this to be a useful check of the structure of the code. I'm afraid moving to JSON, while surely possible, would remove a layer of the checks we do, so I would be slightly against that.

What branch should I look at when reviewing the code/documentation?

Please refer to the joss_submission branch.

Sbozzolo commented 1 year ago

Sorry for the delay. I changed the list into a checklist. In this, let me remark again that this list summarizes my first impressions, so I'd encourage you to also think about the documentation beyond these points.

Thank you - yes I understand the documentation is very lacking, we are working hard on it. Apologies it is taking quite some time, we will keep updating here as we go along!

Thank you! There's no hurry, the goal of this review is to help you built and publish an awesome package :)

Apologies - I don't quite understand what you mean by "the file should not be at the root level of your repository". Should these be in a separate branch?

The file is at the same level as the directories "docs", "pygwb", ....

However, this file is very technical and is used only for tests (as I understand). Wouldn't it more appropriate to put it in the "test" folder? This would immediately give users a sense of what the file is about.

Every time we update a part of the code, indeed the pickles "stop working", and we update them accordingly. We've found this to be a useful check of the structure of the code. I'm afraid moving to JSON, while surely possible, would remove a layer of the checks we do, so I would be slightly against that.

In the interest of maintainability of the package, I think it would be best to have readers/writers to standard formats, and unit tests over those readers and writers to restore the layer of checks. This would be a more robust solution for two reasons: (1) you use a stable and clear format, as opposed to an opaque pickle, (2) new developers would see the reader/writer functions with their unit tests, and they would be able to more easily understand what is going on and extend the package (suppose a test fails with a pickle error, how would the developer know what to do? If you have reader/writers with tests, the failing test would immediately point the developer to the right direction).

Of course, this is my suggestion, and you are free to pick your preferred implementation. In any case, make sure to document everything.

Sbozzolo commented 1 year ago

Addendum: I realized that there are more pickle files in the tests folder. I had "checkpoint.pkl" in mind when writing the message.

a-renzini commented 1 year ago

Addendum: I realized that there are more pickle files in the tests folder. I had "checkpoint.pkl" in mind when writing the message.

oops - that file is NOT meant to be there. All the pickles used for checks should be in test/test_data! We must have forgotten to gitignore it...

a-renzini commented 1 year ago

Sorry for the delay. I changed the list into a checklist. In this, let me remark again that this list summarizes my first impressions, so I'd encourage you to also think about the documentation beyond these points.

Thank you - yes I understand the documentation is very lacking, we are working hard on it. Apologies it is taking quite some time, we will keep updating here as we go along!

Thank you! There's no hurry, the goal of this review is to help you built and publish an awesome package :)

Apologies - I don't quite understand what you mean by "the file should not be at the root level of your repository". Should these be in a separate branch?

The file is at the same level as the directories "docs", "pygwb", ....

However, this file is very technical and is used only for tests (as I understand). Wouldn't it more appropriate to put it in the "test" folder? This would immediately give users a sense of what the file is about.

Every time we update a part of the code, indeed the pickles "stop working", and we update them accordingly. We've found this to be a useful check of the structure of the code. I'm afraid moving to JSON, while surely possible, would remove a layer of the checks we do, so I would be slightly against that.

In the interest of maintainability of the package, I think it would be best to have readers/writers to standard formats, and unit tests over those readers and writers to restore the layer of checks. This would be a more robust solution for two reasons: (1) you use a stable and clear format, as opposed to an opaque pickle, (2) new developers would see the reader/writer functions with their unit tests, and they would be able to more easily understand what is going on and extend the package (suppose a test fails with a pickle error, how would the developer know what to do? If you have reader/writers with tests, the failing test would immediately point the developer to the right direction).

Of course, this is my suggestion, and you are free to pick your preferred implementation. In any case, make sure to document everything.

I like this suggestion. Our data writer functions can write to multiple formats, pickle is just one of them. So as I said, storing/checking JSONs (or other) is no problem. However, what I was originally thinking of adding is a test which runs our pipeline and writes the object pickle file, which then can be used by other tests to check its content. This would avoid storing static pickle files, but would maintain the test of the object structure. What are your thoughts on this?

Sbozzolo commented 1 year ago

I like this suggestion. Our data writer functions can write to multiple formats, pickle is just one of them. So as I said, storing/checking JSONs (or other) is no problem. However, what I was originally thinking of adding is a test which runs our pipeline and writes the object pickle file, which then can be used by other tests to check its content. This would avoid storing static pickle files, but would maintain the test of the object structure. What are your thoughts on this?

That's excellent!

a-renzini commented 1 year ago

Hello @Sbozzolo and @cmbiwer - apologies on the delay. I'm making a list here of the things we have addressed which you can take a look at, as we continue working on documentation+. I remember I had a few things to improve in the unit tests as well, I'm working on it!

READY (ish)

We're working on restructuring the docs, essentially turning some of the tutorials into actual pages and leaving some remaining tutorials as "extras". The idea is that the API pages present each module/object, but we will add pages that describe how these are then used for analysis, and clarify the objectives/deliverables of the package, as well as how to obtain them. We will probably add about one page per script present in the pygwb_pipe folder (these are the scripts used in our day to day analyses)

kevinturbang commented 1 year ago

Hello @Sbozzolo and @cmbiwer,

We have made significant progress in updating and refining our code and the documentation. Concretely,

At this point of the review process, it would be most useful if you could provide feedback on the work that has been done so far, while we finalize the documentation. This will allow us to incorporate any proposed changes as we are making these final modifications to the documentation.

Sbozzolo commented 1 year ago

Hi @kevinturbang.

Thank you for your hard work. I'll review the new documentation over the weekend and provide my feedback.

Sbozzolo commented 1 year ago

Should I still be looking at the joss_submission branch even if the last commit was on Jun 21?

a-renzini commented 1 year ago

@Sbozzolo no - we are now working directly in the gitlab as suggested: https://pygwb.docs.ligo.org/pygwb/ (I have updated the README in the repo to reflect this)

Sbozzolo commented 1 year ago

Thanks, @a-renzini. I went through the documentation and here are some comments.

I haven't gone through the APIs carefully yet, but they seem fairly comprehensive.

To sum up my current assessment of your documentation: you wrote good pages/API/tutorials, but there is no underlying thread that joins them. As a new user, I still feel lost and I couldn't explain what pyGWB does exactly. I would recommend holding your users's hands a little bit more (e.g., providing a learning progression). I can give you examples of packages with good documentation if you want some inspiration on how to structure yours. One that might be close is LEGWORK.

kevinturbang commented 11 months ago

Dear @Sbozzolo,

First of all, thank you very much for all the useful feedback. This has been very valuable to get the documentation to its current status. We have implemented the comments above, and tried to add a more logical structure within the documentation itself. Could you please have a look and let us know if any further improvements are necessary? Thank you!

Sbozzolo commented 11 months ago

Dear PyGWB team,

Thank you for your efforts. The user documentation is looking great (I like the "tips" to direct the reader to other pages).

I only have two remaining comments, one trivial, and the other less so:

Sbozzolo commented 11 months ago

Also, the CONTRIBUTING.MD file and the contributing page in the docs are not perfectly in agreement.

By the way, can non-LIGO users contribute using LIGO's gitlab?

kevinturbang commented 9 months ago

Dear reviewers,

Apologies for the delay on our end. We have added the following to address the points above:

Hopefully this addresses all your remaining comments. Please let us know if anything else needs modification.

a-renzini commented 9 months ago

Dear @Sbozzolo and team, can we mark this issue as closed? Thanks!

a-renzini commented 9 months ago

(sorry accidentally hit the purple button)

Sbozzolo commented 9 months ago

Documentation is a never ending process, and I invite you to keep working on it as you expand PyGWB. For JOSS, feel free to close this issue.