Closed mathause closed 4 years ago
I'm the one who put 'timestamp' in the list of keys to be saved, so I will look into this.
Another solution, also in sdsqlutils.insert.get_dict(), would be something like
if k in instance.__dict__:
d[k] = instance.__dict__[k]
But a more fundamental problem is that the file has no timestamp at this point. I will try to reproduce what you saw.
I see the problem of missing timestamp since July 29. It would be nice, if you can reproduce it. For me it looks like a server-side issue.
@mathause and @beyerle , what index node do you use?
There's no need to tell me what index node you use; they are probably all the same.
I could reproduce the problem. Synda uses this search url, which is correct:
https://esgf-node.llnl.gov/esg-search/search?variable_id=areacella&fields=*&project=CMIP6&frequency=fx&experiment_id=historical&type=File&format=application%2Fsolr%2Bjson
If you just visit this url and look hard, you can find the problem - look at the first of the three entries in the JSON list for AWI.
The problem is that the index node returns bad information on some files. It almost surely originates with whoever published it. We could work around a missing time stamp, but that is just a symptom. There is other questionable metadata there. A work-around to the timestamp problem would just postpone the error to when it is harder to catch.
BTW, we don't really need the 'timestamp' attribute. It's almost completely useless because it doesn't have a standard definition. This is the first time that I've seen a good use for it - it helped us catch a publishing error!
Synda does need to be fixed. Errors should not be mysterious, they should be easy to find and understand. The right thing to do is to catch the exception, log it, and move on to the next file. The problem file will never get into the database, let alone be downloaded, until someone asks the publisher to re-publish the file. But if the installation failure is properly logged, it will be possible to do this.
I will comment later once I have a fix ready.
I just submitted a pull request, "Two bug fixes to prevent crashes", #153. It fixes this issue and another, new problem which can cause a Synda crash. I think that this is more urgent than my earlier pull requests.
@painter1, we also encountered this problem. One thing that stands out is that all problematic files reside on esgf-data1.llnl.gov
as can be verified with a command like the one below. Using a different search node yields the same result. What's more, the same files on other replicas seem to work. So maybe there is a server problem going on after all?
wget -O- 'https://esgf-node.llnl.gov/esg-search/search?variable_id=areacella&fields=*&project=CMIP6&frequency=fx&experiment_id=historical&type=File&format=application%2Fsolr%2Bjson&limit=10000' | jq '.response.docs|map(select(has("timestamp")|not))|map(.data_node)|unique'
Hello all, been away for a while I apologize. @painter1 I did not see your PR and I had to deploy a hotfix for this specific issue in v3.11 which is currently live on conda. I think it's in the same spirit of the PR. I'll be marking this issue as closed for now, feel free to reopen if it pops back up.
@zklaus, I don't see what is wrong. For me that command line just returns
[ "esgf-data1.llnl.gov" ]
and visiting the url in a web browser, I don't see anything unusual. Could you specifically report what your experience is?
@AtefBN , note that my PR also addresses another "bug" which was newly created by a change in ESGF software. This change is deployed at LLNL and will probably be propagated throughout ESGF, eventually. And the change addressing this issue does not overlap with yours, and is generally useful. So please take a look at this pull request.
@zklaus, I don't see what is wrong. For me that command line just returns
[ "esgf-data1.llnl.gov" ]
and visiting the url in a web browser, I don't see anything unusual. Could you specifically report what your experience is?
What you say is exactly right and what this tells us is that this problem only occurs with the data node esgf-data1.llnl.gov
.
I should have broken the command down, let me do this now. The wget
part executes exactly the search that you used before with the addition of limit=10000
so that we get many more results than the default 10.
The resulting json is then filtered by the jq
command. The filter expression is a series of sub-expressions that works like this:
.response.docs| # extract the docs list that is part of the response field of the json
map(select(has("timestamp")|not))| # select all entries that have no "timestamp" field. remove the "|not" part to invert selection
map(.data_node)| # print the "data_node" field from those results
unique # remove duplicates
If we remove |not
from this, we get all results that do have a timestamp
field. This produces the following list:
[
"aims3.llnl.gov",
"crd-esgf-drc.ec.gc.ca",
"dist.nmlab.snu.ac.kr",
"esg-dn1.nsc.liu.se",
"esg1.umr-cnrm.fr",
"esgf-data.ucar.edu",
"esgf-data1.llnl.gov",
"esgf-data2.diasjp.net",
"esgf-data3.diasjp.net",
"esgf-node2.cmcc.it",
"esgf.nci.org.au",
"esgf.rcec.sinica.edu.tw",
"esgf3.dkrz.de",
"noresg.nird.sigma2.no",
"vesg.ipsl.upmc.fr"
]
So we see that files from all nodes have timestamp
s except for some files served by LLNL. Note that the data node in question appears in both lists, so the problem might be quite subtle.
If we restrict the search further to the AWI ESM by appending &source_id=AWI-ESM-1-1-LR
(convenience link), we get only the three replicas of one file.
The one replica of that file that resides at LLNL is lacking the timestamp
field, the other two replicas do have the timestamp
field.
Notably, the original upload at DKRZ (identified by replica: false
) is functional.
Thus, it seems something has gone wrong in the process of replication to the node esgf-data1.llnl.gov
at LLNL.
@painter1 Yes I will shortly get to ingest the PRs on this repo, hopefully by start of next week. I will subsequently cut a release with the changes for everyone to use.
@zklaus , thank you for that detailed information. This AWI file is the problem file from before. I will investigate and should understand the problem next week.
@painter1, thanks! I did deliberately choose the search and file that you mentioned before as an example. However, in our efforts to maintain our local partial replica of esgf data for analysis, we have encountered this problem with other files as well, always only with the LLNL node. If you need more examples, let me know and I will peruse the logs.
@zklaus, I just want you to know that your issue with timestamp not being available has not been forgotten. It's a publisher issue though, not a Synda issue. So Sasha Ames intends to look into it.
@painter1 thanks for keeping me posted! Let me know if you need more input or testing. Should I be tracking this in some other ways?
I'm not aware of anything formal, e.g. nothing on an issue tracker.
When trying to install a selection file synda fails with a key error. This started some days ago and does not happen for all queries. One example that fails
where
selection/cmip6_fx.txt
file looks as follows:This fails with the following traceback:
The discovery.log does not give any hints for missing timestamps:
I don't really have an idea what could cause this? Is it a parallel processing problem?
A potential workaround is to replace
with
Is it essential that the timestamp is correct?
@beyerle