AlexKuhnle / ShapeWorld

MIT License
58 stars 18 forks source link

Adding new shapes #25

Closed dschaehi closed 4 years ago

dschaehi commented 4 years ago

Hi @AlexKuhnle, I wanted to add additional shapes to the existing ones (such as letters and numbers as shapes) , so I added some Shape subclasses to shape.py and corresponding entries in english.json, but it seems that the shape names have to be in the english.dat file as well so that they are covered by the grammar. Is this correct? Then I am wondering whether there is an easy way to add the shapes to english.dat. I spend quite a bit of time to figure this out, but failed to do so. Any help is appreciated. Thanks!

dschaehi commented 4 years ago

I managed to download ERG and to add new words to lexicon.tdl. I guess this is the right approach, right? Needed a lot of research to figure this out though.

AlexKuhnle commented 4 years ago

Hi, yes, if you add new nouns, they have to also be part of the grammar for the system to be able to generate corresponding sentences. Unfortunately this is not straightforward. If you want to get deeper into the grammar aspect (ERG/ACE/DMRS), I can probably point you to 1-2 things, or maybe I can check for myself whether I can quickly add some words for you.

However, the easiest option is to use existing nouns, say "apple" for "A". You can then either replace words accordingly in a postprocessing step, or if you train a model on only ShapeWorld data, the model can just learn that "apple" refers to "A". :-)

AlexKuhnle commented 4 years ago

I managed to download ERG and to add new words to lexicon.tdl. I guess this is the right approach, right? Needed a lot of research to figure this out though.

Yes, this is the right approach. I've modified a few ERG files for ShapeWorld which I can send you, and then for adding new words you just need to modify lexicon.tdl, as you suggest.

dschaehi commented 4 years ago

Ah good. Thanks again for the quick reply! If you could put lexicon.tdl in shapeworld/realizers/dmrs/languages/, then it would be good for other people as well. (If there are more ERG files to be modified, then they could go to the same folder).

dschaehi commented 4 years ago

Oh, it would be also good to know which ERG version you used as there seems to be incompatibilities between the .tdl files from different ERG versions. If you could just upload the whole ERG you used for ShapeWorld to somewhere in this repository (e.g., in dmrs folder), then it would be even better: in this case I can just download your ERG version and run ace on config.tdl.

AlexKuhnle commented 4 years ago

I finally added ERG compilation information here. The compile.sh script is how I re-compiled the ShapeWorld version of the ERG. Unfortunately, I don't have access to my PhD data anymore, so I don't know how to find out which revision of the ERG repository is the right one. Moreover, ACE doesn't run on my computer, for whatever reason, so this is all I can give you. You could try a few revisions around 2017-2018, or maybe just adapt the compile.sh script to the latest version, i.e. add required new files and remove non-existing old ones (assuming that the grammar didn't change very much).

dschaehi commented 4 years ago

Hi, thank you for the update! I made a pull request so that the script is compatible with the current version of erg. I generated some example datasets and it worked fine so far.

AlexKuhnle commented 4 years ago

Thanks for your help, I think this can be closed then. I will also post the current working ERG svn repo revision here, 29160, in case a later version is not compatible anymore.