bjascob / amrlib

A python library that makes AMR parsing, generation and visualization simple.
MIT License
218 stars 34 forks source link

Ask questions about mapping errors #73

Open Jolin-rgb opened 5 days ago

Jolin-rgb commented 5 days ago

Do I need to do additional data preprocessing of the text before using the stog parser, I have a lot of mapping errors after my text is processed with the par 1 2 3 ser.

bjascob commented 5 days ago

It's not uncommon for the parser to log a few warnings. This is the "deserializer" taking the raw output of the language model and re-formatting it to a proper AMR graph. However, it could be that your incoming sentences are difficult to parse. Make sure you're only parsing a single sentence and not an entire paragraph or block of text all at once. Also make sure you're feeding it proper English sentences and not just data such as a list of items or other non-sentence formatted text.

If you're not doing any of the above and your output looks reasonable, then this is likely just the normal minor formatting errors that the model sometimes produces. If you want to experiment with it more, you can take the AMR 3.0 test set, process it through your code and run SMatch scoring on the results. Scores should be in the low 80s.

Jolin-rgb commented 4 days ago

It's not uncommon for the parser to log a few warnings. This is the "deserializer" taking the raw output of the language model and re-formatting it to a proper AMR graph. However, it could be that your incoming sentences are difficult to parse. Make sure you're only parsing a single sentence and not an entire paragraph or block of text all at once. Also make sure you're feeding it proper English sentences and not just data such as a list of items or other non-sentence formatted text.

If you're not doing any of the above and your output looks reasonable, then this is likely just the normal minor formatting errors that the model sometimes produces. If you want to experiment with it more, you can take the AMR 3.0 test set, process it through your code and run SMatch scoring on the results. Scores should be in the low 80s.

Thank you very much for your reply, it is really very detailed and serious. However, I would like to confirm with you the definition of a single sentence, for example, the sentence I entered into the parser is "The Arlington County Board plans to vote Saturday afternoon on giving Amazon $23 million and other incentives to build a headquarters campus in Crystal City, but only after hearing scores of northern Virginia residents and advocates testify for or against the project." Split by period, this also seems to be a single sentence, but the parser gives the following error: 1726288938830 However, the following plot results will also be given: 1726289054609 1726289072688 But when the sentence entered into the parser is "The Arlington County Board plans to vote on Saturday afternoon on giving Amazon $23 million and other incentives to build a headquarters campus in Crystal City," or "but only after hearing scores of northern Virginia residents and advocates testify for or against the project." , dividing the previous input into two sentences with a comma,the parser will give a plot result without errors. Is there a clear length limit for the definition of a single sentence or a division with commas and periods as dividers? Or are these three mapping error reminders something that can be ignored? I would like to use the AMR mapping results for downstream tasks, so this is important to me, and I hope to get your answers, thank you very much!

bjascob commented 4 days ago

Just do normal sentence splitting at the period. This is normal. If your really worried about performance bart-large will give slightly better results.