Open sheng-z opened 5 years ago
As large vocabulary might pose some training challenge (so copy is needed), and recategorizing is needed to make one to one alignment, I don't think it is feasible to perform ablation here.
If you replace sequence tagging problem with latent alignment by seq2seq, it seems every thing eles could be kept as the same. However, I am not sure how relation prediction should interact with recategorization system.
Regards, Chunchuan
On Mon, 26 Nov 2018, 17:08 Sheng Zhang <notifications@github.com wrote:
Hi Lyu, impressive work! I found that a large portion of the preprocessing code is to build the copying dictionary and recategorization, which seems nontrivial. Since you didn't report the ablation study for this part in the paper, I wonder
- How will the model perform w/o copying/recategorization?
- Suppose a parser takes (preprocessed-and-linearized) AMR strings and (preprocessed) English strings as input, is it possible to make use of your preprocessing step in that parser? because it's more fair to compare based on the same preprocessed input, right?
Thanks!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ChunchuanLv/AMR_AS_GRAPH_PREDICTION/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/ADs1bfq31PjBe_hKEuRCkdNrd5bikUYyks5uzCApgaJpZM4YzdR_ .
Hi Chunchuan, The preprocessing seems very complicated. I wonder if a simple way that makes the transformation between a raw AMR and its preprocessed AMR exists. i mean the input and output are raw AMR text. And that should solve @sheng-z ‘s question.
Thanks!
Hi Chunchuan, The preprocessing seems very complicated. I wonder if a simple way that makes the transformation between a raw AMR and its preprocessed AMR exists. i mean the input and output are raw AMR text. And that should solve @sheng-z ‘s question.
Thanks!
Yes. I've read some AMR papers recently. Nearly all of them do recategorization in preprocessing, but few give details about it. Everyone seems to reinvent the wheel (at least partially). Recategorization now seems a necessary step in AMR parsing. Sadly afaik no one has yet conducted a throughout analysis on it. This could make people upset since it's unclear how much improvement is from the recategorization.
@ChunchuanLv, it would be a great contribution if anyone can standardize or just detail the following pre- and post-processing steps so research can be really focused on parsing rather recategorization, and thus parsing algorithms comparison can be made more fairly.
# Preprocessing
AMR -> [Recategorize] -> recategorized AMR
# Training/Testing
txt -> [Parser] -> recategorized AMR
# Postprocessing
recategorized AMR -> [Recover] -> AMR
Best, Sheng
Indeed this is a problem, we actually think the amr treebank people should provide such conversion (a lexicon). There are actually resources like https://amr.isi.edu/doc/amr-dict.html , but it is only human readable. Also, the conversion might be hard to standardise, because your algorithm might want to handle things differently. For example, seq2seq only want to linearize AMR, but keep relations and node together, while my model actually goes like this:
txt -> [Parser] -> recategorized AMR
recategorized AMR -> [Recover] -> AMR concepts -> [relation prediction] -> AMR
But in principle if such a scheme exist it could be a good starting point.
Chunchuan
On Wed, 28 Nov 2018, 17:20 Sheng Zhang <notifications@github.com wrote:
Hi Chunchuan, The preprocessing seems very complicated. I wonder if a simple way that makes the transformation between a raw AMR and its preprocessed AMR exists. i mean the input and output are raw AMR text. And that should solve @sheng-z https://github.com/sheng-z ‘s question.
Thanks!
Yes. I've read some AMR papers recently. Nearly all of them do recategorization in preprocessing, but few give details about it. Everyone seems to reinvent the wheel (at least partially). Recategorization now seems a necessary step in AMR parsing. Sadly afaik no one has yet conducted a throughout analysis on it. This could make people upset since it's unclear how much improvement is from the recategorization.
@ChunchuanLv https://github.com/ChunchuanLv, it would be a great contribution if anyone can standardize or just detail the following pre- and post-processing steps so research can be really focused on parsing rather recategorization, and thus parsing algorithms comparison can be made more fairly.
Preprocessing
AMR -> [Recategorize] -> recategorized AMR
Training/Testing
txt -> [Parser] -> recategorized AMR
Postprocessing
recategorized AMR -> [Recover] -> AMR
Best, Sheng
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ChunchuanLv/AMR_AS_GRAPH_PREDICTION/issues/5#issuecomment-442530665, or mute the thread https://github.com/notifications/unsubscribe-auth/ADs1beKnvLS03zIFhMZvHVnEauNAwgp1ks5uzsX1gaJpZM4YzdR_ .
@ChunchuanLv thanks for your thoughts. Just another view of your model: So different from others, the heavy pre-/post-processing happens only around the concept identification stage in your design. The data flow can be draw as: txt -> recategorized concepts -> concepts -> AMR Then what we are looking for (or interested to study its impact) is:
AMR -> recategorized concepts (better the preprocessed AMR)
recategorized concepts -> concepts
If a simple implementations of the above two transformations exists, that will help solve the questions.
Hi Lyu, impressive work! I found that a large portion of the preprocessing code is to build the copying dictionary and recategorization, which seems nontrivial. Since you didn't report the ablation study for this part in the paper, I wonder
Thanks!