OHDSI / Athena

Web application for distributing and browsing the Standardized Vocabularies for all instances of an OMOP CDM
59 stars 19 forks source link

cpt4.jar UMLS Endpoints #299

Open wtroddy opened 2 years ago

wtroddy commented 2 years ago

I'm trying to reconstitute the CPT4 vocabularies using the cpt4.sh file and am getting this error:

Exception in thread "main" org.odhsi.utils.cpt.Cpt4Exception: Cannot process CONCEPT.csv file. You can find more details in the logs/logfile.log file.
Reason: cannot request TGT
        at org.odhsi.utils.cpt.Application.main(Application.java:45)

I'm doing this in a VM that has a limited allowlist and suspect this is the problem. I've tested the same download/API key on another machine that isn't restricted and it works fine.

Right now we've allowlisted these base URL's but I'm guessing I'm missing one (or some intermediate jumps?): https://uts-ws.nlm.nih.gov/ https://utslogin.nlm.nih.gov/ http://umlsks.nlm.nih.gov/

Is there any documentation on which UMLS API endpoints are required by cpt4.jar? If not, can you point me to source code so I can dig through the API calls?

Thanks in advance!

alex-odysseus commented 2 years ago

So far this is the correct list of base URLs:

https://uts-ws.nlm.nih.gov http://umlsks.nlm.nih.gov https://utslogin.nlm.nih.gov

wtroddy commented 2 years ago

I've tested each of these individual URL's and can access them okay. I've also been able to generate a TGT using the same endpoints but in a python script but I'm still getting an error - here's the: logfile.log

I'm still suspecting the timeout is related to our restricted allowlist. Is the source code for the jar file public anywhere? I can't seem to find it but would like to try and add additional logging to see which http request this is getting stuck on. Or do you know of another way to get more verbose logs with the details of where I might be getting stuck?

mik-ohdsi commented 2 years ago

Hi @wtroddy - in this forum post the TGT error was due to an invalid certificate

wtroddy commented 2 years ago

Hi @mik-ohdsi, awesome - thanks! I'd searched the forums but somehow missed this thread. I commented to see if they know which certificate was the problem but will do some investigation on our end in the interim.

mik-ohdsi commented 2 years ago

Hi @wtroddy - were you able to fix it? You could also try a new download from Athena, as the CPT4.jar has just recently experienced a small update. If it works for you now, can you close the issue and if it doesn't tell us what is still not working?

wtroddy commented 2 years ago

Hi @mik-ohdsi - thanks for the update. I've been out of the office and haven't gotten back to this. I'll be back in the office after the long weekend in the US. When I'm back, I'll try the updated JAR file and update this issue accordingly. Thanks!

wtroddy commented 2 years ago

Just a quick update - I've tried the updated CPT4.jar and am still getting the same error. We're investigating the certificate possibility now but don't have any updates quite yet. Do you know if there's a way to get a more verbose message about why (or where in the process) the application is not able to request the TGT? I see there is a SocketTimeoutException but any additional information would be helpful since I'm able to connect to the endpoints successfully with other programs.

Thanks.

mik-ohdsi commented 2 years ago

Hi @wtroddy - I looked at the logfile again and I agree that the problem is not so much in the ticket requesting logic but rather in connecting a socket to execute that. For reference, this is the UMLS documentation about the API and here are the Java samples. I still suspect, the problem is rather not solvable on our side but rather on the level of the firewall that you have build around that VM. Do you have logging implemented that would check the network traffic from that VM to the outside and ports that may need to be opened? And maybe you can execute that shell script with somewhat elevated privileges and try logging the calls internally in the VM? Or the JRE that you have in the VM is somewhat restricted in its outside communication... Are the other programs that you used successfully Java based? What is the System OS by the way?

wtroddy commented 1 year ago

Hi @mik-ohdsi - please pardon the delay on this. We're working with a platform vendor and it's taken some time for them to investigate on their end.

For your last two questions - my other programs were not java based (we're starting to test this now) and we're using ubuntu 20.04.

The latest recommendation from our vendor was to pass our proxy host and port as arguments when running the java app. I've tried to run the jar file with something like this:

java -Dhttp.proxyHost=$PROXY_HOST -Dhttp.proxyPort=$PROXY_PORT -Dumls-apikey=$UMLS_API_KEY -jar cpt4.jar 5

I'm still getting similar results, though. I've asked the vendor to confirm that this fix works with other allowlisted sites in a java to try and isolate the problem.

In the meantime, I thought I'd check here - do you know if there's any reason the jar file wouldn't be making use of the additional java flags when trying to connect to the UMLS?

irbraun commented 1 year ago

As a small update on this- the platform vendor clarified that they do run other java programs on the VM where this problem is occurring, and in those cases do use the proxy host and port arguments to fix this allowlist issue.

wtroddy commented 1 year ago

Hi @mik-ohdsi and @alex-odysseus - it sounds the issue might be around the proxy host/port arguments. I was just curious if you know of any reason the cpt4.jar wouldn't be accepting/using those parameters or have any other ideas what might be the problem?

mik-ohdsi commented 1 year ago

Hi @wtroddy - I guess we simply haven't implemented providing host/port arguments in the tool. Let me check with @alex-odysseus if we can put this on the wish list. Meanwhile, as a workaround (and I know this is a bit inconvenient), can you run the process of vocabulary concept name resolving for CPT outside your VM and then move the processed csv files there afterwards?