Annotald / annotald

A program for annotation in the Penn Treebank format
GNU General Public License v3.0
8 stars 3 forks source link

Sporadic copying of tokens #78

Open rtruswell opened 8 years ago

rtruswell commented 8 years ago

Quite unsure about how to describe this, but annotald has twice duplicated an entire undominated CONJP (that is, I had an IP-MAT, I relabelled it as CONJP, I was about to add it to a preceding sentence, and annotald kind-of-reluctantly duplicated it). Hitting Z didn't undo. Ever seen anything like that? Happy to email you the file (or upload here) if you want to look through it. Thanks, Rob

(PS: annotald is also reluctant to let me use right-click to embed one constituent within another on this one file. I have to workaround by moving one leaf at a time. Never had that problem before either)

aecay commented 8 years ago

Can you send me the file? I'll see if I can duplicate the problem, but it may not be easy.

If it happens again happens, could you check the javascript console for errors? That is accessed through the hamburger menu to the right of the address bar > More tools > Developer tools, then click the "Console" tab at the top of the window that pops up. Copy any text which appears and paste it into this bug report.

aecay commented 8 years ago

OK, I think I have a lead. In an email you said that Annotald was hiccuping on the file you sent me (creditonat.psd) and one other. Can you confirm whether the other file (and ideally the specific sentence) that was giving you problems also had numeric lemmas (as in (NUM on=hondred-100) from the file you sent me, specifically the -100 part)? I believe these are a contributing factor to the problem.

rtruswell commented 8 years ago

Ah, yes, I had wondered about that. It certainly messes with the sequential indexing. I can't remember which other file (and the first time it happened, I just fixed it and thought no more), but I remember that the two files that caused problems were close together temporally. Looking at the last-modified dates, it could have been any of these (which also have similar numerals in). All attached in their pre-annotald state.

MOST LIKELY THE CULPRIT bodley26t digby2a1t thorneykt LEAST LIKELY THE CULPRIT

I don't think numerals in that format automatically cause the problem, though I see that they could well contribute: I'm also attaching one of the texts I worked on in York (beverleyt), which also has numerals like that but didn't give me any problems.

Thanks! Rob

On 14/10/2015 13:49, Aaron Ecay wrote:

OK, I think I have a lead. In an email you said that Annotald was hiccuping on the file you sent me (creditonat.psd) and one other. Can you confirm whether the other file (and ideally the specific sentence) that was giving you problems also had numeric lemmas (as in |(NUM on=hondred-100)| from the file you sent me, specifically the |-100| part)? I believe these are a contributing factor to the problem.

— Reply to this email directly or view it on GitHub https://github.com/Annotald/annotald/issues/78#issuecomment-148040130.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

aecay commented 8 years ago

The attachments must not have come through via github's email system...can you send them to me directly?