JuDFTteam / best-of-atomistic-machine-learning

🏆 A ranked list of awesome atomistic machine learning projects ⚛️🧬💎.
Creative Commons Attribution Share Alike 4.0 International
370 stars 35 forks source link

Fix update-best-of-list Action error #345

Closed Irratzo closed 1 month ago

Irratzo commented 1 month ago

Describe the issue:

Issue split off from #256.

After having added new projects (batch-processable 'yes' add-project issues) to projects.yaml in #256, manually ran the update-best-of-list Action workflow. The run of workflow update-best-of-list failed with error

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/best_of/generator.py", line 107, in generate_markdown
    projects = projects_collection.collect_projects_info(
  File "/usr/local/lib/python3.8/site-packages/best_of/projects_collection.py", line 622, in collect_projects_info
    if project_info.name.lower() in unique_projects:
TypeError: 'Dict' object is not callable

Reverted projects.yaml back to previous states and ran workflow again. Even for previous states where the workflow had succeeded before, this new error kept appearing. My debugging process is documented in the following comments.

Irratzo commented 1 month ago

Corrected errors and warnings in commit 1496d8b.

Subsequent workflow run failed with same error.

Irratzo commented 1 month ago

The source code locations throwing the above error are in the best-of-generator project. Here are the corresponding lines in its latest release. generator.py, line 107, permalink and projects_collection.py, line 648 not 622, permalink. So, my workflow here seems to use an earlier version. Indeed, in the workflow run here and here, we learn that it collected and installed best-of==0.8.5. This at the moment is also the latest release from Jan 11, 2022. But the latest tag is 0.8.6 here from Jan 17, 2022. There have been multiple commits since then, without a new realease, starting with this one and ending with the most recent one from Jan 2023.

A thing to note about the TypeError above is that project_info here is not a Python dictionary, but an addict Dict dictionary. This type of dictionary allows key-value access via dot operator, as though it were an attribute. So, project_info.name for an addict::Dict is equivalent to project_info['name'] for a Python STL dictionary. So, this should work. However, it doesn't. Above, the object is created in a loop like this.

    for project in tqdm(projects):
        project_info = Dict(project)

        if project_info.name.lower() in unique_projects:
        # TypeError: 'Dict' object is not callable

This function is called in generate_markdown like this, where another function creates the projects list.

        config, projects, categories, labels = parse_projects_yaml(projects_yaml_path)

        # [...]

        projects = projects_collection.collect_projects_info(
            projects, categories, config
        )

The function parse_projects_yaml is here permalink, where the projects are simply loaded as list from the projects.yaml file like this.

    with open(projects_yaml_path, "r") as stream:
        parsed_yaml = yaml.safe_load(stream)

    projects = parsed_yaml["projects"]
Irratzo commented 1 month ago

There are several options on ways to debug this.

Option 1). Remote, indirect. Step back in commit history, using git revert, until the workflow stops failing. Then narrow down the error cause from there. Afterwards, successively add stuff from the original new target state, either manually or via double-revert, until we are back at the original, now repaired, state.

# git revert = undo a previous commit or commits
# undo last or specific commit
git revert commit-hash
# undo range of commits, including the first one
git revert oldest-commit-hash^..latest-commit-in-range
# the caret '^' ensures that `oldest-commit-hash` is included.

References. 1, 2.

Option 2). Local, indirect. Install best-of-generator package (same version as here, 0.8.5) in local env and run the yaml -> markdown creation manually, and debug that with the current projects.yaml that causes the error.

Option 3). Local, direct. Debug the Action workflow directly in VSCode. From Google Search How to debug a github action.

Side note. Should really switch my repo to use the new best-of-generator release... Maybe I can just fork it and then instruct my local pipeline to use that latest version instead of the official one.

Irratzo commented 1 month ago

Option 1). Reverted back to commit 690529435f2eec444d2d377daed403ff10186062, state before adding all the new projects. Ran workflow.

git revert c21de8e9f381d4ac07e1df688ad426456a8cfd7e^..1496d8bb23970b39a069e63155d1fce57887af70

Same error.

Option 2). Created new env, installed uv pip install best-of. (remember to deact/activate env to make CLI tools available). Copied curernt state of input files (project.yaml, header, footer etc.) into that folder and ran best-of generate projects.yaml locally. It succeeded both with best-of==0.8.5 and best-of==0.8.5. I.e., a REAMDE.md was produced, and the above TypeError does not appear. That is confusing. I would have expected it to fail the same way as does the action above, since the input is the same, and the best-of-generator version is the same. But maybe the parameters are different. Evidenced by the fact that list of warning messages is slightly different, too. Or maybe sth else is causing the differeing behavior.

Irratzo commented 1 month ago

Option 1). Reverted back to commit 6c78866a46bd5332e3106c3514de0aba0dd4c731, last automatic pull request and last time workflow ran successfully.

git revert 4cc287fe7dc6a5e7a6becd34679e8fa638365a22^..690529435f2eec444d2d377daed403ff10186062

Same error.

Option 2). Ran best-of generate projects.yaml locally with same file as in Option 1). Run succeeds without error.

Both envs remote and local use same addict version 2.4.0. Why does it fail now in the workflow run, even though the exact same fail succeeded before. Now I am really confused.

Irratzo commented 1 month ago

Option 1). In commit 93ff773, replaced the orginal best-of-update action with a personal fork of it.

How did I do that. Forked the original action repo, adjusted the action's name in action.yml, created a release 0.1.0 with tag v0.1.0. Publication to GitHub Marketplace was not necessary. Then in this repo's update-best-of-list.yml workflow, replaced the GitHub address of the respective action.

First adaptation I did was to change Python and best-of version in the action's Dockerfile from 3.8 to 3.12 and from 0.8.5 to 0.8.6 (latest).

Then ran the update-best-of-list workflow again here. It failed again, with the same error. The exception output looks only different because the Python version changed.

92%|█████████▏| 397/430 [05:06<00:25,  1.29it/s]
Error: -13 13:34:14,864 [ERROR] Failed to generate markdown.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/best_of/generator.py", line 107, in generate_markdown
    projects = projects_collection.collect_projects_info(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/best_of/projects_collection.py", line 648, in collect_projects_info
    if project_info.name.lower() in unique_projects:
       ^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'Dict' object is not callable
Irratzo commented 1 month ago

Side note. Tested out addict dictionary Dict a bit in Colab https://github.com/mewwts/addict.

Could reproduce this specific error TypeError: 'Dict' object is not callable. It is just a generic behavior of a Dict object y that is not dependent on its content. Meaning, it can be empty. If you access a non-existing member y by the dot notation, then it returns a new empty Dict object. The original y is not affected, i.e. no member y is created.

# !pip install addict
from addict import Dict
x = Dict()
x.y
# returns {}, a new emtpy Dict
x
# returns {}, the original unchanged dict
x.bla()
# TypeError: 'Dict' object is not callable

So, what happens above in line if project_info.name.lower() in unique_projects:, is probably this. project_info.name returns an empty Dict object rather than a string, because the member does not exist. That can only be true, if the original project entry in yaml file had no name field. Check that now by going through the list.

Went through projects.yaml projects list in Emacs. M-s o (function occur), for pattern - name:, yielded 370 results with current file state, and none of the names were empty. occur for pattern category: yielded also 370 results. This implies that no names are missing.

Irratzo commented 1 month ago

Option 1). By replacing the action above with my own, I can now also replace best-of-generator with a personal fork of it and add some debugging output to workflow run to analyze the error. Let's do that.

How did I do that. Forked the original action repo, packed the code that throws the original error of this issue in try-except with logging in best-of-generator fork, commit 253977a, create release (not really necessary, but better reproducible).

Now, in my update-best-of-list action personal fork, switch to adapted best-of-generator personal fork in this commit, create new release of the action personal fork. Then update the action version used in the workflow in this project, done in commit 4ae5646.

Irratzo commented 1 month ago

Ran workflow. The run failed. But now with the debug output from my best-of fork, here.

 Warning: 3 15:43:43,711 [WARNING] TypError on Dict-mapped project_info name for project {'Documentation': 'http://qmlearn.rutgers.edu/', 'license': 'MIT'}, info {'Documentation': 'http://qmlearn.rutgers.edu/', 'license': 'MIT'}.
 92%|█████████▏| 397/430 [07:17<00:26,  1.26it/s]
 92%|█████████▏| 397/430 [07:17<00:36,  1.10s/it]
Error: -13 15:43:43,711 [ERROR] Project has no 'name' field. Abort. Project {'Documentation': 'http://qmlearn.rutgers.edu/', 'license': 'MIT'}, info {'Documentation': 'http://qmlearn.rutgers.edu/', 'license': 'MIT'}.
Error: -13 15:43:43,712 [ERROR] Failed to generate markdown.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/best_of/projects_collection.py", line 649, in collect_projects_info
    project_name = project_info.name.lower()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'Dict' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/best_of/generator.py", line 107, in generate_markdown
    projects = projects_collection.collect_projects_info(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/best_of/projects_collection.py", line 656, in collect_projects_info
    raise err
  File "/usr/local/lib/python3.12/site-packages/best_of/projects_collection.py", line 653, in collect_projects_info
    project_name = project["name"].lower()
                   ~~~~~~~^^^^^^^^
KeyError: 'name'

So, the new project entry QMLearn, add-project issue #250, was the original source of the error. It was added in #256 along with the other projects. Its YAML entry was malformed, and this led to the cryptic error. The malformed entry was added to projects.yaml and present in the #256-relevant commits c21de8e and 1496d8b. We can see the malformed entry there.

- name: QMLearn
  description: Quantum Machine Learning by learning one-body reduced density matrices in the AO basis. https://doi.org/10.1038/s41467-023-41953-9
  gitlab_id: pavanello-research-group/qmlearn
  category: ml-esm
- Documentation: http://qmlearn.rutgers.edu/
  license: MIT
  # package managers: none

The line - Documentation: ... was the issue that caused the error. It led to the action recognizing this as two separate projects, where the second project did not have a name entry. The well-formed QMLearn entry should have been # homepage: ... or homepage: .... In the subsequent commits, I reverted the project additions (more on that below), and finally added them again, now with the QMLearn entry formatted correctly, in commits f7a2aec and 046be81.

- name: QMLearn
  description: Quantum Machine Learning by learning one-body reduced density matrices in the AO basis. https://doi.org/10.1038/s41467-023-41953-9
  gitlab_id: pavanello-research-group/qmlearn
  category: ml-esm
  homepage: http://qmlearn.rutgers.edu/
  license: MIT
  # package managers: none
  # Note. Documentation not linked in repo README at time of writing. So, making an exception and adding it here as homepage.
Irratzo commented 1 month ago

The strange thing is what happened when I had reverted the commits with the new projects from #256. These were the commits 005b8e3, 4ce1f3b, ... (more reverts), c1bd7b5 (largest back-in-time revert) ... (workflow debugging with personal forks, described above), in the time Aug 13, 11:20 AM to 4:33 PM.

In all of these reverted commits, the project QMLearn was absent from projects.yaml. In fact, it was nowhere in the whole repository (checked with grep). But still, all of the workflow runs with these commits to main failed with the same error, as though the malformed QMLearn entry was still present in projects.yaml. These were the five runs in that timeframe, 1, 2, 3, 4, 5.

From commit f7a2aec onward, time Aug 13, 5:02 PM, I added restored projects.yaml to the additions from #256, but now with the corrected QMLearn project entry. The run with this commit also failed, with the same error, as if the QMLearn entry was still malformed.

Then I had the idea that the only cause of this can be that the GitHub Actions update-best-of-list workflow of this project is somehow pulling an old commit version from main, where the error still exists. I first looked into the workflow run file here, .github/workflows/update-best-of-list.yml. There are two actions/checkout@v4 mentions (one conditional). So, the repo must be pulled one or two times. So, I looked into the workflow run logs linked above. At first, the repo is pulled in the correct, current commit version, for example here. But then, a *second actions/checkout@v4 step is performed. The previous pull is overwritten with this one, and the old, incorrect commit is pulled, here. This is commit 1496d8b, where QMLearn entry is still malformed.

I have not figured out yet, why this is happening. It never happened before that an older commit is pulled than the current one.

TODO

Irratzo commented 1 month ago

It turned out that I did not have to figure out the root cause of this strange behavior.

What resolved it, was to do a new commit with a slight change to projects.yaml, a day later, in commit 046be81, and then run the workflow again. This workflow succeeded. Both actions/checkout@v4 steps now pull the same, current commit from main, here and here.

So, the error is solved. But I don't know how to solve the second error in this issue (workflow pulls older commit), in case it happens again. The first tip is to do it like in this comment, do a slight change, a comment will do to projects.yaml, and try again. If that won't help, then go back to previous comment and find the root cause for this behavior.

Irratzo commented 1 month ago

Forgot something.

Should return this project's Action workflow update-best-of-list to its original state. Then rerun and test if it still works.

Switched back in commit 92c35dd. Ran workflow. It succeeded.

Keep original workflow for now. But created new issue #348 for thinking about replacing the official one with a self-maintained one. This issue showed that such a procedure is feasible.