Closed jordanpadams closed 11 months ago
@jordanpadams
The way cucumber works it goes back for a download every test - validate is run from the CLI each and every time. There is not a java wrapper around it. Hence, cache is for each individual test not the entire test suite.
@al-niessner The SAs and I dug into this, and it does not look like anything abnormal is happening on the WAF side of the house blocking GitHub or my laptop builds (I was seeing these errors). It has become pretty evident it is something specific to the Saxon upgrade that is causing these intermittent failures. Not sure if they changed something in the way they are caching the loaded data or how they are handling http vs https or ??? But looking back through the github action history and doing some testing, it is pretty clear it's a Saxon upgrade thing:
If nothing else, I think we need to add some exception handling into where source.getSystemId()
is being called to track this issue down, and then maybe we just do a retry or 2? Or if it is an http vs. https thing maybe we need to just automatically detect and fix those URLs? Not sure.
I added a stacktrace to this branch, and you can see some more interesting information about when/how it is failing: https://github.com/NASA-PDS/validate/actions/runs/6662161511/job/18106097361#step:5:22250
I know we experienced some issues in the past with trying to clear the JAXB cache with different versions of the PDS4 schemas (see the comments in validate.feature regarding `Move github87 as it is interfering with github292 tests)
@jordanpadams
Now this is funny. I am at run 6 no failures with maven test of validate. I am using my phone's 5G and it is not particularly fast. Seems slow networks are good networks.
@al-niessner I just ran it once and it failed... JPL VPN maybe?
@jordanpadams
No, I am trying to decide why it is intermittent. Changing the rate of the network is consistent with a race condition somewhere and SAXON looks likely. Maybe they will have 12.4 with it fixed sooner than I can find it.
@al-niessner Copy that. A race condition makes sense here. Very annoying.
I am ok with reverting for now, and leaving a PR out there for the changes you have made, and we can come back to it if/when they fix it?
@jordanpadams
Why revert? I am on the trail of this and it is not too difficult to work around. Besides, I do not have built in mirrors to see behind me. Move forward or stand still.
@al-niessner sounds good. As long as we fix the errors, I am good. Just wanted to make sure you weren't banging your head too hard on the table trying to get this to work.
TestRail Test ID: T8681186
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
When I did a
mvn test
, it works sometimes, sometimes it fails indicating it cannot download a schema or schematron, and other times it just hangs...Atmospheres also noted that validate is running and then just hangs and appears to not be doing anything.
🕵️ Expected behavior
I expected it would download and test successfully.
📜 To Reproduce
mvn test
(may need to try it a few times)🖥 Environment Info
Mac OSx (me) Linux (ATM Node)
📚 Version of Software Used
v3.2.0 v3.4.0-SNAPSHOT
🩺 Test Data / Additional context
Any large test data sets.
🦄 Related requirements
⚙️ Engineering Details
DSIO ticket created to investigate CloudWatch and any blocking happening there
For development, let's verify our local caching is actually working and we are not going back to JPL website over and over again