Closed schmmd closed 6 years ago
We got it from here: http://cs.stanford.edu/~ppasupat/research/h-strict-all-matching-lfs.tar.gz. Though, see this paper: https://www.semanticscholar.org/paper/It-was-the-training-data-pruning-too!-Mudrakarta-Taly/09b94316daac1bb88e9a052aa2f8f662c1c6c469. And also note that our current implementation of this parser is about 4.5-5 points behind the original implementation. We're stuck on figuring out what the cause of this difference is. We're waiting on cleaning up the code and writing tutorials for how to use the semantic parsing stuff until we've matched the performance of the parser (and we're pretty busy with EMNLP for the next week, anyway).
@matt-gardner Thank you very much! Good luck with your EMNLP!
Hi, how I can run the preprocess_wikitables.py script?
We're not quite ready to fully support this code yet; I'll be substantially cleaning it up in the next couple of weeks, now that the EMNLP deadline has passed. We're aiming to include the parser, a demo, and some nice tutorials in an upcoming major release of AllenNLP. Until then, things are a bit rough.
For now, download the data from the dataset's website, download the DPD file linked above, set up the config file to point to the right files (see the example in training_config/
, but change the data paths to point to .examples
files instead of .jsonl
files), and run the script. If you need more help than that, it's probably best to wait until we have some tutorials written.
Thank you for the prompt reply :-) I've just found out how to run it. However, there are lots of missing DPD (I'm trying to load the random split1), is it normal?
I'm looking forward to the documentations. Please let me know if there is anyway I can help. All the bests and good luck with EMNLP.
Yes, only about 2/3 of the data has DPD logical forms, which means you're throwing out a bunch of your training data right off the bat. Our EMNLP submission tries to address this, at least in part...
That sounds interesting. Too bad we have to wait until August to check out your paper :-)
Hi, can you share the hyper-parameters/random seed of the model? So far I can only get 35% accuracy.
Yes, I mentioned above that we were ~5% behind the original parser. When I ran the original parser on the same data that I gave this, I got around 40.5% on dev. The highest I've gotten with this implementation is ~35.7. My current guess is that this is due to how we're handling things like (number 1)
in the logical form, but I'm still looking into this.
Thanks for making this available. We are thinking of starting a project on semantic parsing on Wikitables, and I'm wondering whether to use this implementation or the original Java/Scala implementation as the starting point. What do you recommend? Are you still 5% behind the original parser? That's a big performance gap.
Also, any chance of getting a preprint of your EMNLP submission? (smile)
Thanks again for this wonderful work,
Mark
We've gotten up to 38% now, though that's still 2-2.5% behind the original implementation. I've mostly just been refactoring and cleaning up the code, without looking into the performance difference much. We have state-of-the-art results for NLVR, so the basic framework appears to be working, something's just up with the WikiTables parser, and it might be due to trying to interface with the SEMPRE executor - we had to try to recreate SEMPRE's table-to-LF logic, and there are some corner cases that might be messing something up there. We're currently looking into using Chen Liang's simpler language from his MAPO work. We'll see if that gives us better performance.
So, that's a bit of a long explanation that doesn't really answer your question... If you want to use the original implementation, just note that there are some serious dependency headaches (it depends on a particular, old version of dynet, and dynet rewrote their git history, so you can't actually find that commit on their repo anymore), and Jayant's gone and no one is supporting his old code. Pick your poison =).
Thanks for the summary! I'm pleased to hear that you are slowly catching up to the published results.
The wikitables parser is merged into master, the demo will be live on demo.allennlp.org soon, and we're not doing any more development on the SEMPRE lambda-DCS version. I'm guessing that the remaining ~2% discrepancy is due to differences in lemmatization and number detection between spacy and CoreNLP - we found some errors where, e.g., "Russian" didn't match to "Russia" because they didn't map to the same lemma, and other similar issues.
We are still working on the simpler language in the MAPO paper, and I'm hopeful that that will give improved performance. But, I'm closing this issue as finished. Official launch of the semantic parsing framework should happen within the next two weeks.
Hi @matt-gardner, the link you have provided above for the dpd denotations doesn't exist anymore. I found some resources from P. Pasupat's GitHub but the h-strict-all-matching-lfs.tar.gz doesn't exist anymore. https://nlp.stanford.edu/software/sempre/wikitable/dpd/
would the whole dump of denotations work? its the first zip on the above link.
The last zip file is the file that we used. Note, though, that this is using additional human annotation than the first file.
Hi @matt-gardner thanks for a quick reply. The last zip file doesn't exist anymore on this link as well..is there any other way to get that zip ?
Hmm, that's odd, I'd ask Ice (Panupong Pasupat) about that. If for some reason he really doesn't want to host it anymore, we can figure out a place to put it.
Sure Thanks :+1:
@YoPatapon I have checked that link but the fourth, filtered logical forms are not there if you try to download.
I found it can be downloaded now.
@matt-gardner Hi, Matt. I noticed that you mentioned your original implementation achieves 40.5% accuracy on dev set. But the accuracy reported in the paper is 42.7% with a single model. Is it a average accuracy of k-fold validation or what? Thanks.
The ~40.5 number for the original implementation was calculated on the dev set of split 1, training with 10 logical forms. This is one run of one fold of what's in table 4 in the original paper. That's the setting we were starting with when re-implementing it, so that's the number we were comparing against.
Hi, could you provide the DPD file?