Closed giovannipizzi closed 4 years ago
@giovannipizzi after looking into this, I realize I had misunderstood this issue: For some reason, I assumed we could just specify the full list of files that could possibly be copied. Instead, of course AiiDA will check if the files are actually present, and fail if they are not.
Explicitly specifying the files is problematic because of the UNK*
files, whose filenames are somewhat dynamic.
I think we should decide what the long-term interface should be, and then implement a (maybe suboptimal) version that will be forward-compatible with that. If it's feasible to keep the listdir
of a RemoteData
in the local DB, I would actually prefer the current interface.
For reference, here's the list of possible input files
seedname.mmn
seedname.amn
seedname.dmn
seedname.eig
seedname.win # Generated by the plugin
seedname.chk # Can also be OUTPUT
UNKp.s
seedname_htB.dat # Can also be OUTPUT
seedname_htL.dat
seedname_htR.dat
seedname_htC.dat
seedname_htLC.dat
seedname_htCR.dat
seedname.unkg
seedname.uHu
seedname.spn
Hi @greschd , thanks!
Indeed, for the local copy list, all files need to be specified. But for these, we have them in the repository so we can expand the '*' if needed.
For the remote copy list, the call is here:
and transport.copy supports globbing (currently in the plugin we go and expand UNK*
to the explicit list of files, but this is not needed, and we could just say UNK*
in the source, and a folder name in the destination).
The only problem I see (maybe we discussed this already) is the order of operations - now the files in the local_copy_list
are transferred first, and might be overwritten by the remote_copy_list
(and probably we would like by default to have the opposite behaviour?).
In the future we can have in AiiDA a flag to swap the order.
But if we go with #76, this is not a problem right? Because we can only specify one parent_folder
, either FolderData
or RemoteData
.
We just need to put in some logic where we expand the *
with the files_finder
only if we have a FolderData
- would this work?
Aha! Yeah this might work.. so the logic would be
local_input_folder
: keep current behaviorremote_input_folder
:
remote_symlink_list
, with globbing for UNK*
seedname.chk
in remote_copy_list
is that correct?
Of course we won't have the validation or required input files in the case of a remote input, but I think that is unavoidable.
We probably want to keep additional_remote_symlink_list
, additional_remote_copy_list
and additional_local_copy_list
, right?
I think what you suggest is ok.
A comment on one thing that I just realised, however - if you call transport.copy
with a specific filename, if will fail if it does not exist, while it does not raise if it's a globbing pattern that does not match anything... Maybe this is a bit of an unexpected behaviour (and this might make things a bit more complex in our case).
See e.g.
In [21]: t.copy('sd?', 'output')
In [22]: t.copy('sdf', 'output')
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-22-3a076541162d> in <module>
----> 1 t.copy('sdf', 'output')
~/git/aiida-core/aiida/transports/plugins/local.py in copy(self, remotesource, remotedestination, dereference, recursive)
552 if not self.has_magic(remotesource):
553 if not os.path.exists(os.path.join(self.curdir, remotesource)):
--> 554 raise OSError('Source not found')
555 if self.normalize(remotesource) == self.normalize(remotedestination):
556 raise ValueError('Cannot copy from itself to itself')
OSError: Source not found
Maybe an approach would be to symlink files using the pattern *.amn
ignoring what the seed name is?
Or other ideas?
Yeah, I would just glob on the suffix, then.
As a long-term solution, maybe we can introduce the concept of "required" vs. "optional" files in AiiDA itself? I think this only makes sense if other codes use a similar pattern, though.
It's also just a bit too easy to accidentally open a transport. Here we do it explicitly, but I think this could be hidden in a call to RemoteData.listdir
.
Yeah, I would just glob on the suffix, then.
Great!
As a long-term solution, maybe we can introduce the concept of "required" vs. "optional" files in AiiDA itself? I think this only makes sense if other codes use a similar pattern, though.
It's a good point. Do you want to open an issue on aiida-core (I don't think we have one), both for the order of local vs. remote, and for this?
It's also just a bit too easy to accidentally open a transport. Here we do it explicitly, but I think this could be hidden in a call to RemoteData.listdir.
It's a good point. Suggestions welcome on how to change this (maybe we have to change RemoteData.listdir
to have a specific option open_transport=True
, and set this by default to False, and use some internal cache as you were suggesting? I think this is also good for an issue on aiida-core (that might go to 2.0 though, since it's backward-incompatible)
Currently, during the
prepare_for_submission
we open a transport to inspect the files available on the remote, and then perform logic to decide what to do:https://github.com/aiidateam/aiida-wannier90/blob/68c03557465b5e0e44d95c5e92549175460abd06/aiida_wannier90/calculations.py#L142-L149
However, this is not ideal, because
While we discuss if this can be changed in AiiDA itself (see aiidateam/aiida-core#3477), I think we can apply the following different design: instead of inspecting the files and deciding what to do, we should instead ask explicitly the user where the files are (e.g. via specific flags in the
settings
input node, and maybe require these only if we pass both a remotedata input and a folder data input)? Anyway, for single simulations, the user can check; and for workflows, the workflow probably knows what the previous step retrieved.