dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
222 stars 58 forks source link

running models and obtaining results #459

Open rocky2397 opened 1 month ago

rocky2397 commented 1 month ago

Hi, I am currently trying to use GERBIL for the evaluation of some of your models. (specifically D2KB with mGENRE and our imported model SpEL hosted on our localhost)

We already deactivated the HTTPBasedSameAsRetriever in the gerbil.properties but it still does all the redirects, which makes the evaluation really slow.

Is there a way to fix it or is there maybe a problem with the code, which need to be updated?

Any suggestion would be really helpful at this point.

MichaelRoeder commented 1 month ago

Hi, Thank you for using GERBIL.

How do you know that it still looks for the redirects? :thinking:

When you run an experiment and you have the impression that GERBIL is too slow, could you please open http://localhost:1234/gerbil/running (please adapt the address as needed), copy the stack trace(s) that you see there and paste them here? That could help a lot to figure out what GERBIL is actually doing :wink:

rocky2397 commented 1 month ago

so this is basically what is stated in the running under URL.

eTConfig("SpEL MSNBC (NIF WS)","MSNBC","D2KB","STRONG_ANNOTATION_MATCH") state=RUNNABLE progress=null java.base/sun.nio.ch.Net.poll(Native Method) java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:191) java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:201) java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309) java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346) java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796) java.base/java.net.Socket$SocketInputStream.implRead(Socket.java:1108) java.base/java.net.Socket$SocketInputStream.read(Socket.java:1095) java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:489) java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:483) java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70) java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1462) java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1068) org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) org.aksw.gerbil.dataset.check.impl.HttpBasedEntityChecker.entityExists(HttpBasedEntityChecker.java:64) org.aksw.gerbil.dataset.check.impl.EntityCheckerManagerImpl.checkUri(EntityCheckerManagerImpl.java:135) org.aksw.gerbil.dataset.check.impl.FileBasedCachingEntityCheckerManager.checkUri(FileBasedCachingEntityCheckerManager.java:156) org.aksw.gerbil.dataset.check.impl.EntityCheckerManagerImpl.checkMeaning(EntityCheckerManagerImpl.java:84) org.aksw.gerbil.dataset.check.impl.EntityCheckerManagerImpl.checkMarkings(EntityCheckerManagerImpl.java:64) org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:79) org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:50) org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50) org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:106) org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) java.base/java.lang.Thread.run(Thread.java:1570)

additionally that appears in my terminal when I try to run it on the MSNBC dataset. I added the annotater using the config web UI and it worked but I can just not run the model entirely. It specifically only works with KORE50.

Screenshot 2024-10-10 at 15 00 20
MichaelRoeder commented 1 month ago

From the information under /running, we can see that GERBIL is investing time to check the dataset for faulty entities. This is another step after the sameAs retrieval that can (unfortunately) take quite a lot of time. If you would like to skip this step, please have a look at the wiki article about HTTP-based entity checking.

I am not sure whether I understand the Python stack trace of your program correctly. It seems like GERBIL is sending a request without a NIF context. However, that would be very surprising. :thinking: Maybe you can turn off the entity checking and then try the different datasets that you would like to use. If the error still occurs, I am happy to have another look at it :slightly_smiling_face: