Wycliffe-USA / rtt-linguiflow

GNU General Public License v2.0
12 stars 16 forks source link

Fixed regex in unbound2plain.py #62

Closed tjohnson314 closed 8 years ago

tjohnson314 commented 8 years ago

Fixes #25

bbriggs commented 8 years ago

The new regex looks like it's working.

Now we need to update asv_plain.txt and spanish_plain.txt to reflect the new standard. Can you add that data and push it onto this PR?

tjohnson314 commented 8 years ago

Oops, this isn't what I wanted. I updated the files inside the Vagrant box by running the new version of unbound2plain.py, exited, and committed them. But somehow once I'm outside the Vagrant box, the old version of the files is still there.

I'm confused now. I thought the corpora folders were synced? Maybe that doesn't guarantee that the subfolders under them are synced?

bbriggs commented 8 years ago

When you run unbound2plain.py from the tests folder, it outputs a file called asv_plain.txt.out. Move that to asv_plain.txt. Do the same for spanish. Those will serve as our test standards.

You can do this from outside the Mgiza machine by starting at the root of the git repo and running sh ./mgiza/tests/unbound2plain.sh

-- Briggs

On Jan 20, 2016, at 9:35 PM, tjohnson314 notifications@github.com wrote:

Oops, this isn't what I wanted. I updated the files inside the Vagrant box by running the new version of unbound2plain.py, exited, and committed them. But somehow once I'm outside the Vagrant box, the old version of the files is still there.

I'm confused now. I thought the corpora folders were synced? Maybe that doesn't guarantee that the subfolders under them are synced?

— Reply to this email directly or view it on GitHub.

bbriggs commented 8 years ago

Now that I think about it, the output from my_tokenizer.py depends on correct input from unbound2plain.py, so you'll need to update the following files in /tests/data:

asv_plain.txt spanish_plain.txt asv_tokenized.txt spanish_tokenized.txt

tjohnson314 commented 8 years ago

The log might be messier than it needed to be, but everything should work now, haha.

bbriggs commented 8 years ago

Okay, @tjohnson314, I hate to do this to you, but here goes: We merged that reeeeally big re-org into development last night. Can you please pull and merge that change into your PR, then push it back? That should get us fixed for good on the testing front.

stealthybox commented 8 years ago

66 is merged.

We should ensure that @tjohnson314 has a commit history though.

bbriggs commented 8 years ago

Agreed. His works as a hotfix on master and will show up in the commit history, but might not show up in the git blame once we push dev onto master.

tjohnson314 commented 8 years ago

Wow, I didn't know git blame was a thing. That's pretty cool. So what should I do?

bbriggs commented 8 years ago

I don't think you really need to do anything. You submitted a critically important hotfix to the master branch, which we really needed.

It all depends on how you want to be credited. Your commit will remain in the git history, but the blame will most likely shift to me for that line since I submitted that change in dev after yours on master. It's just a single line, though, and you have lots of other code contributed. Not to mention that you're driving a lot of our algorithm design. I wouldn't sweat credit over a single line in a regex :)

bbriggs commented 8 years ago

I really hate to do this and it's my mistake, but I fixed this in another PR thinking that this one was targeted at the master branch. Merging it in would require a lot of de-conflicting to just bring us back to where we already are.

I messed up, sorry.

tjohnson314 commented 8 years ago

If you're worried about me getting credit, it doesn't really matter to me. That's not why I'm working on this. :-)