WING-NUS / Neural-ParsCit

Neuralized version of the Reference String Parser component of the ParsCit package.
http://wing.comp.nus.edu.sg/parsCit
Other
78 stars 17 forks source link

Output result is not same with online demo. #24

Closed hiber-niu closed 5 years ago

hiber-niu commented 5 years ago

Hi , much thanks for this great work.

  1. I run this using the instructions from readme file and get a different output comparing with Online version .

  2. And comparing with online demo, run.py provided cannot easily combine words and tags to citations.

The attachment bellow is used to test. pdf_text_for_test.txt

kylase commented 5 years ago

Hi, the online version is the CRF-based model and not the NN-based model, hence the difference.

hiber-niu commented 5 years ago

@kylase Do this version support combining these tags to complete citations? I have tested the previous text, and found all these tags are joined together and can not split them to single citations. BTW, if I feed this program with text extracted from tika with empty lines, this program could not work probably.

kylase commented 5 years ago

I will need to clarify with you on what is Neural-ParsCit and ParsCit.

It is an naming issue (it is a bit confusing): Neural-ParsCit is the NN-based reference string parser of ParsCit, it doesn't do document section labeling (SectLabel), which one of the PhD students in the lab is working on porting it to NN-based model.

Neural-ParsCit only parse reference strings and nothing else. If you are looking for whole document parsing, until the PhD finish with the work, ParsCit should be your current solution.

hiber-niu commented 5 years ago

Thank you for the explanation.

I have read the paper about Neural-ParsCit and figured this out. And I have decided to use ParsCit although i am not familiar with perl.