Vitaliy-1 / grobidPlugin

0 stars 0 forks source link

Problems with Grobid api #1

Open pakojil opened 3 years ago

pakojil commented 3 years ago

Dear @Vitaliy-1 First of all, thank you for your effort to maintain so many solutions and to continually improve them. Excuse my daring and ignorance, but I have a problem whose solution I can't find. I have installed your grobidPlugin, and I have a Grobid installation on my computer, with the web services operating on port 8070. I have configured the URL correctly, but I see in the GrobidPlugin.inc.php file that you invoke this address: const GROBID_SERVICE_API_PATH = "/ api / processFulltextDocumentJATS". Unfortunately my Grobid installation does not have that service. Is there a way to implement it? If so, can you give me the information about how to do it? Thanks a lot. A greeting NOTE: I have OJS 3.1.2.2 | 3 | 1 | 2 | 2 OS platform | Darwin (MacOS Catalina) PHP version | 7.3.11 Apache version | Apache/2.4.41 (Unix) PHP/7.3.11 Database driver | mysqli Database server version | 8.0.22

Vitaliy-1 commented 3 years ago

PDF to JATS conversion is implemented only for the dev branch of my fork: https://github.com/Vitaliy-1/grobid/tree/dev. It's neglected for some time but should work. These 3 commits are responsible for JATS conversion: https://github.com/Vitaliy-1/grobid/commit/a8cb148c2821ff62d4b14a920c65cb7f41fce2eb https://github.com/Vitaliy-1/grobid/commit/f4809cfccd8a2c98e8c179fae9bc5164996ab402 https://github.com/Vitaliy-1/grobid/commit/2f2ea0eb31d008a2c906fe717782c5490cb71f09 I'm not sure if they can be applied cleanly to the current master branch, probably, it requires some work if you want to use the latest code.

Vitaliy-1 commented 3 years ago

Let me know if you encounter problems, I'll gladly help

pakojil commented 3 years ago

Thank you very much, @Vitaliy-1 , for the prompt response, and for the information. I'll see if I can get it started. I will keep you informed of my progress (or not). Best regards

pakojil commented 3 years ago

Hi, @Vitaliy-1

I have the Grobid server working with your modifications.

Now, I'm going to be testing how it works, although at first glance I notice that the references are very well treated, breaking down the necessary tags.

On the otherhand, I am not sure that the treatment of images, figures, tables, etc. , be the desired one.

Besides, something happened to me that I don't understand. I have converted the PDF you see in the capture, and the resulting XML starts with page 2. I don't know why the first page has been omitted, which is where the <front> elements are.

I'm going to do some research, and if I see anything to comment on, I will.

Thank you so much for everything.

Please feel free to tell me if my comments are heavy or unwanted.

A greeting

PS: I could upload the converted PDF to you, but it's a real article from a journal that edits my University, and I don't have the permissions to do it.

If you tell me a system to do it in private, I wouldn't have any problems, if you needed it to do checks.

image

Vitaliy-1 commented 3 years ago

On the otherhand, I am not sure that the treatment of images, figures, tables, etc. , be the desired one.

Besides, something happened to me that I don't understand. I have converted the PDF you see in the capture, and the resulting XML starts with page 2. I don't know why the first page has been omitted, which is where the elements are.

Does this happen with the TEI XML conversion too? I'm trying to understand if this is the problem of only JATS conversion or Grobid in general.

pakojil commented 3 years ago

Hi, Vitaliy Indeed, the problem seems to be in the Grobid conversion to TEI, not in your transformation to JATS.

As you can see in the xml code, which I paste, the <body> section begins with the same content that I was saying, and that corresponds to the beginning of the second page of the original PDF. image

I'm sorry I bothered you.

Thanks for your time and attention

Vitaliy-1 commented 3 years ago

I have rebased these changes onto the current master branch here: https://github.com/Vitaliy-1/grobid/tree/main. I think I resolved all conflicts there in the right way. It works but there are some drawbacks, e.g., the mechanism of getting article title, reference volume and first last page has changed. It requires an update of the JATSFormatter class. You can try it, maybe the problem that you mentioned is resolved there. On the other hand, to display the same results as in TEI, it requires some additional work (because of changes described). Currently, I don't have much time to spend on it but I can point what needs to be changed and where to find code samples.

pakojil commented 3 years ago

Hi Vitaly Thank you very much for everything. Don't worry for now, I don't want to steal your time. We are all very busy. I perfectly understand. I will test the changes, and I will comment. I repeat, thank you very much, and don't worry. Greetings