Closed broeder-j closed 4 years ago
ok the calcjob hash on the CI is different...
Thanks! I will give this a more thorough review later, but regarding the first point: We can definitely merge this even if it's not 100% ready yet, since the same is true for the package as a whole.
Let me know if you want me to take a look at the CI issue.
Also I am not sure how to resolve these pre-commit mypy issues:
tests/export_cache/test_export_cache.py:23: error: Class cannot subclass 'WorkChain' (has type 'Any')
tests/export_cache/test_export_cache.py:23: error: Base type WorkChain becomes "Any" due to an unfollowed import
aiida_testing/export_cache/_fixtures.py:315: error: Need type annotation for 'union_pk' (hint: "union_pk: Set[<type>] = ...")
Found 3 errors in 2 files (checked 13 source files)
I've added comments on how to fix the mypy
issues - but I'm also not 100% convinced if mypy
is a net benefit here, given that I also have limited experience with it so far.
This depends on whether you (and the other contributors) want to get acquainted with mypy
, or if it's mostly a nuisance.
Tagging @PhilippRue @ltalirz for comment.
Up to you, guys. I've never used mypy, but now that python 3.5 is the lowest python version we support, and 3.6 will be it in September, I would say it's only a matter of time for type hints to enter aiida-core (they would certainly be welcome!).
Since there seem to be a couple of different flavors (+ changes again in 3.6), I'd appreciate it if someone who has experience would write a short AEP on how we should go about this.
Well, then maybe aiida-testing
is a decent testing ground for that, too.
With "different flavors", do you mean external libraries? In the standard, I'm only aware of the typing
module - and newer Python versions just add to that (and where hints are allowed).
Right, I just meant the different places for type hints - there is PEP 483, 484, 526, 3107 etc. - not external libraries. I guess it's quite straightforward if you've used it once.
Oh, I see. The way I understand it so far is these PEPs mostly build upon each other.. so there's no competition between flavors, just different levels of what's possible.
concerning mypy. it would be nice if one could ignore all types injected from external libary imports. Which is the only think I find annoying. Also the type annotations make things less readable in my opinion. Through I see the benefits.
The external library imports can be ignored by adding an entry in https://github.com/aiidateam/aiida-testing/blob/develop/mypy.ini, for each import that should be ignored.
Ignoring all external imports is explicitly discouraged by mypy
, because it can hide actual errors.
ok towards that the hash on the CI differs: objects to hash:
[{'withmpi': False, 'resources': {'num_machines': 1, 'num_mpiprocs_per_machine': 1}, 'append_text': '', 'parser_name': 'diff', 'prepend_text': '', 'output_filename': 'patch.diff', 'scheduler_stderr': '_scheduler-stderr.txt', 'scheduler_stdout': '_scheduler-stdout.txt', 'mpirun_extra_params': [], 'environment_variables': {}, 'import_sys_environment': True, 'custom_scheduler_commands': ''}, {'file2': 'f744e4551095439a50a78488b89bf85e6b9bd347d5f99bbfd2fde64954cf91c5', 'file1': 'f6d1c358fb7a03d0831faf710b39ea945911fddab259be85ed0d25a51e62c992', 'parameters': '57b7298bdbd24a13add102c387767d645f9c8e0a86a9d0b1ab6492bda59e0758', 'code': 'bce6cfbda7e548b4af0f27cf5cd39a1245b21f3f6c85a0b5a9278ccc7f6837cc'}]
[{'withmpi': False, 'resources': {'num_machines': 1, 'num_mpiprocs_per_machine': 1}, 'append_text': '', 'parser_name': 'diff', 'prepend_text': '', 'output_filename': 'patch.diff', 'scheduler_stderr': '_scheduler-stderr.txt', 'scheduler_stdout': '_scheduler-stdout.txt', 'mpirun_extra_params': [], 'environment_variables': {}, 'import_sys_environment': True, 'custom_scheduler_commands': ''}, {'file2': 'a403c9809fc94ab203da6536a184e194a7856d50e73369345e26bb9ab1f39e5a', 'file1': 'ee69f1c4e4caefadf18fec511d6c780dbfdbcc7ad8b8bfcb4a954b3851b0f654', 'parameters': 'a5045df6d3d8b067897494f3facf13b340316569c13ac747b1c611fbf7e19725', 'code': 'bce6cfbda7e548b4af0f27cf5cd39a1245b21f3f6c85a0b5a9278ccc7f6837cc'}]
The parameters and the singlefile have a different hash on the CI and my computer. @greschd why? is the abs path included in the hash for single file nodes, or is the repo location included for all nodes?
Hmm, I'm not sure about this..
I think the abspath is not responsible, because you can .clone()
a SingleFileData
also locally, and get the same hash.
The repository location shouldn't be in the hash, but I have never done testing across different AiiDA installations / profiles.
In principle there could be a problem with uppercase / lowercase file names (depending on the file system) due to the Folder hashing, which is implemented here: https://github.com/aiidateam/aiida-core/blob/develop/aiida/common/hashing.py#L260
I guess a helper to figure out which parts of the hash would be quite convenient right now (see https://github.com/aiidateam/aiida-core/issues/2549), but for now: Can you also print _get_objects_to_hash
of the files and parameters?
Just tried creating the same SinglefileData
on two different systems (with different repo locations), and I'm also getting the same hash.
On my system the tests work fine with these changes (compared to commit 97cc9c6 type def and ignore for mypy, reverted previous change, excluded aiida version from calcjob hash (4 hours ago) <Jens Bröder>
) all tests pass.
diff --git a/aiida_testing/export_cache/_fixtures.py b/aiida_testing/export_cache/_fixtures.py
index c8a4362..13ef977 100644
--- a/aiida_testing/export_cache/_fixtures.py
+++ b/aiida_testing/export_cache/_fixtures.py
@@ -159,7 +159,8 @@ def load_cache(hash_code_by_entrypoint):
# currently this should only cache CalcJobNodes
qub = QueryBuilder()
qub.append(ProcessNode) # query for all ProcesNodes
- qub.append(Code)
+ # this breaks it:
+ #qub.append(Code)
to_hash = qub.all()
for node1 in to_hash:
node1[0].rehash()
@@ -222,7 +223,8 @@ def hash_code_by_entrypoint(monkeypatch):
"""
Return a list of objects which should be included in the hash of a Code node
"""
- return [self.get_attribute(key='input_plugin'), self.get_computer_name()]
+ return [self.get_attribute(key='input_plugin')]
+ #return [self.get_attribute(key='input_plugin'), self.get_computer_name()]
def mock_objects_to_hash_calcjob(self):
"""
diff --git a/tests/export_cache/caches/diff_workchain.tar.gz b/tests/export_cache/caches/diff_workchain.tar.gz
index ee313ac..795c096 100644
Binary files a/tests/export_cache/caches/diff_workchain.tar.gz and b/tests/export_cache/caches/diff_workchain.tar.gz differ
diff --git a/tests/export_cache/test_export_cache.py b/tests/export_cache/test_export_cache.py
index 98eabab..86c22c7 100644
--- a/tests/export_cache/test_export_cache.py
+++ b/tests/export_cache/test_export_cache.py
@@ -111,7 +111,8 @@ def test_mock_hash_codes(mock_code_factory, clear_database, hash_code_by_entrypo
ignore_files=('_aiidasubmit.sh', 'file*')
)
objs = mock_code._get_objects_to_hash()
- assert objs == [mock_code.get_attribute(key='input_plugin'), mock_code.get_computer_name()]
+ assert objs == [mock_code.get_attribute(key='input_plugin')]
+ #assert objs == [mock_code.get_attribute(key='input_plugin'), mock_code.get_computer_name()]
@pytest.mark.timeout(60, method='thread')
@@ -153,7 +154,7 @@ def test_run_with_cache(
#Test if cache was used?
diffjob = node.get_outgoing().get_node_by_label('CALL')
cache_src = diffjob.get_cache_source()
- calc_hash_s = '833070207922ee3dccbc477db7bbbf871d9df68bef2624604144f76b0edc6f0a'
+ calc_hash_s = '0c048c64551bd0b099ed82a90ff2c36833a3f1d0620712e7059367a67dac5c91'
calc_hash = diffjob.get_hash()
assert calc_hash == calc_hash_s
assert cache_src is not None
@@ -200,7 +201,7 @@ def test_with_export_cache(
#Test if cache was used?
diffjob = node.get_outgoing().get_node_by_label('CALL')
cache_src = diffjob.get_cache_source()
- calc_hash_s = '833070207922ee3dccbc477db7bbbf871d9df68bef2624604144f76b0edc6f0a'
+ calc_hash_s = '0c048c64551bd0b099ed82a90ff2c36833a3f1d0620712e7059367a67dac5c91'
calc_hash = diffjob.get_hash()
assert calc_hash == calc_hash_s
assert cache_src is not None
Note that changing the monkeypatch for the code hash is needed if one first imports an export file that contains the code because this gets the (Imported )
addition to the name
I exclude the version now from most aiida data types. had to monkeypatch, because the version is doubled in there. now it works. ;-). Or do we wont the tests to fail if the aiida version changes? If I have done it right, for pacakge dependend datatypes which do not inherit from Dict, RemoteData, SinglefileData the version check is still in. Maybe this can be done smarter, also basetypes and arrays are not patched yet. Or one could patch node in general to always exlcude the version.
Hmm, maybe ignoring the (aiida or plugin) version can be a separate fixture? That way, a plugin developer can decide - if you always want to have the fixture, all you need to do is create a "dummy" fixture which uses it, and has autouse=True
set.
Hmm, maybe ignoring the (aiida or plugin) version can be a separate fixture? That way, a plugin developer can decide - if you always want to have the fixture, all you need to do is create a "dummy" fixture which uses it, and has
autouse=True
set.
I agree, now the hash_code_by_entrypoint
fixture does more than just change hashing for the code (e.g. change it for SingleFileData
as well).
Also we might want to provide the option for the with_export_cache
to overwrite the output file. Now the export file is only written if no previous file is found.
This could be helpful for debugging/developing workflows (see https://groups.google.com/forum/?hl=en#!topic/aiidausers/y-ekFBlLgsg from mailing list).
Also we might want to provide the option for the
with_export_cache
to overwrite the output file. Now the export file is only written if no previous file is found.This could be helpful for debugging/developing workflows (see https://groups.google.com/forum/?hl=en#!topic/aiidausers/y-ekFBlLgsg from mailing list).
Done
@broeder-j @PhilippRue I think we can do the following: We merge this PR now - I will go over the functionality and try to implement the "node selection by what ran in the context manager" discussed above. After that I will create a new PR (possibly with some interface changes...), and have the two of you review it / test it on your plugins.
Does that work for you?
@broeder-j @PhilippRue I think we can do the following: We merge this PR now - I will go over the functionality and try to implement the "node selection by what ran in the context manager" discussed above. After that I will create a new PR (possibly with some interface changes...), and have the two of you review it / test it on your plugins.
Does that work for you?
sounds good 👍
Since it seemed to be incompatible with the changes in #30, I merged it into a separate branch for now (to keep the tests running on develop
).
Hi, I put this here for review, @PhilippRue and @greschd since my time to work on this is sparse. The code might be not totally fine yet, I left both fixture versions (run_with_cache, with_export_cache) side by side. Maybe we should do a first merge, that we can continue to work on smaller things at a time.
After the slack discussion: The fixture run_with_cache was split, that one can use export_cache and load_cache alone. There is are simple test for these. In order for the run_with_cache fixture test to work, I monkey patched code.get_object_to_hash and calcjob.get_object_to_hash (since it used the computer uuid). there is a fixture for this. thanks greschd for this suggestion.
PhilippRue has solved this issue different by introducing the aiida-core and mock code with a session scope instead of function and imports of computers and codes. In aiida-kkr he has running examples of with run_with_cache. This may not be a solution for several developers.
Further I have implemented also a second fixture to run with exports where you have to specify the export to use, feature suggestions to this by greschd:
Known issue: Sometimes the caching when running does not work (which causes the tests to fail), which I do not understand the hash of the imported and running calcjob is the same. @greschd maybe you spot the reason for this.
I left both fixture versions in, since I would like the export name to be choosen in an automatic way per default. fix #20