ghpaetzold / questplusplus

Pipelined quality estimation.
49 stars 14 forks source link

Intermediate files management #19

Closed fredblain closed 8 years ago

fredblain commented 8 years ago

We need to confirm how are managed the intermediate files since it could be problematic. Here is my concern: I did a word-level exp. and Quest++ didn't complain while the tools.ngram.path option was not properly set. Then I assume Quest++ to not run the aligner if the following files already exists:

./lang_resources/alignments/alignments.word-level.out

without regarding if it belongs to the current data set or one from a prior run. When this file is removed, Quest++ complains about the bad configuration for the aligner.

ghpaetzold commented 8 years ago

This is very odd... I can remove tools.ngram.path entirely and it does not make a difference. Also, this is not the aligner my dear adorable french friend :D it is the path for SRILM's binaries folder


Gustavo Henrique PaetzoldPh.D. Candidate in Computer ScienceUniversity of Sheffield

Date: Mon, 18 Jan 2016 09:34:56 -0800 From: notifications@github.com To: questplusplus@noreply.github.com Subject: [questplusplus] Intermediate files management (#19)

We need to confirm how are managed the intermediate files since it could be problematic.

Here is my concern: I did a word-level exp. and Quest++ didn't complain while the tools.ngram.path option was not properly set. Then I assume Quest++ to not run the aligner if the following files already exists:

./lang_resources/alignments/alignments.word-level.out

without regarding if it belongs to the current data set or one from a prior run. When this file is removed, Quest++ complains about the bad configuration for the aligner.

— Reply to this email directly or view it on GitHub.

fredblain commented 8 years ago

Apologies @ghpaetzold you're right (as usual I would say), and I was wrong: I should write tools.fast_align.path since we are talking about word alignments.

lspecia commented 8 years ago

Carol explained me about the tools.fast_align.path: this is not a path for a tool but rather to a resource that has to be pre-built - so it is fine to have it there as is and not to try to re-generate it at every run

:-)

On 20 January 2016 at 10:47, Frederic Blain notifications@github.com wrote:

Apologies @ghpaetzold https://github.com/ghpaetzold you're right (as usual I would say), and I was wrong: I should write tools.fast_align.path since we are talking about word alignments.

— Reply to this email directly or view it on GitHub https://github.com/ghpaetzold/questplusplus/issues/19#issuecomment-173168974 .

Lucia www.dcs.shef.ac.uk/~lucia/

carolscarton commented 8 years ago

I think this tools.fast_align.path is the path for the tool Fast Align. However, if you provide a file with the alignments precomputed (using -alignments parameter) it will never be used. On the other hand, if you do not provide the -alignments, QuEst++ will try to generate the resource and, in this case, will need the path provided in tools.fast_align.path.

Is it right, @ghpaetzold https://github.com/ghpaetzold?

On 20 January 2016 at 11:48, lspecia notifications@github.com wrote:

Carol explained me about the tools.fast_align.path: this is not a path for a tool but rather to a resource that has to be pre-built - so it is fine to have it there as is and not to try to re-generate it at every run

:-)

On 20 January 2016 at 10:47, Frederic Blain notifications@github.com wrote:

Apologies @ghpaetzold https://github.com/ghpaetzold you're right (as usual I would say), and I was wrong: I should write tools.fast_align.path since we are talking about word alignments.

— Reply to this email directly or view it on GitHub < https://github.com/ghpaetzold/questplusplus/issues/19#issuecomment-173168974

.

Lucia www.dcs.shef.ac.uk/~lucia/

— Reply to this email directly or view it on GitHub https://github.com/ghpaetzold/questplusplus/issues/19#issuecomment-173169222 .

Carolina Scarton PhD Candidate and Research Assistant Department of Computer Science University of Sheffield http://www.dcs.shef.ac.uk/~carolina

ghpaetzold commented 8 years ago

That is how it is supposed to work. :) But since we were not able to make fast align work automatically from Java without crashing, I believe it would be better to remove it from the config file altogether. What do you think?


Gustavo Henrique PaetzoldPh.D. Candidate in Computer ScienceUniversity of Sheffield

Date: Wed, 20 Jan 2016 02:52:52 -0800 From: notifications@github.com To: questplusplus@noreply.github.com CC: ghpaetzold@outlook.com Subject: Re: [questplusplus] Intermediate files management (#19)

I think this tools.fast_align.path is the path for the tool Fast Align.

However, if you provide a file with the alignments precomputed (using

-alignments parameter) it will never be used. On the other hand, if you do

not provide the -alignments, QuEst++ will try to generate the resource and,

in this case, will need the path provided in tools.fast_align.path.

Is it right, @ghpaetzold https://github.com/ghpaetzold?

On 20 January 2016 at 11:48, lspecia notifications@github.com wrote:

Carol explained me about the tools.fast_align.path: this is not a path for

a tool but rather to a resource that has to be pre-built - so it is fine to

have it there as is and not to try to re-generate it at every run

:-)

On 20 January 2016 at 10:47, Frederic Blain notifications@github.com

wrote:

Apologies @ghpaetzold https://github.com/ghpaetzold you're right (as

usual I would say), and I was wrong: I should write tools.fast_align.path

since we are talking about word alignments.

Reply to this email directly or view it on GitHub

<

https://github.com/ghpaetzold/questplusplus/issues/19#issuecomment-173168974

.

Lucia

www.dcs.shef.ac.uk/~lucia/

Reply to this email directly or view it on GitHub

https://github.com/ghpaetzold/questplusplus/issues/19#issuecomment-173169222

.

Carolina Scarton

PhD Candidate and Research Assistant

Department of Computer Science

University of Sheffield

http://www.dcs.shef.ac.uk/~carolina

— Reply to this email directly or view it on GitHub.

carolscarton commented 8 years ago

I believe it ran properly on the server. Maybe it is a windows problem?

On 20 January 2016 at 11:55, Gustavo Henrique Paetzold < notifications@github.com> wrote:

That is how it is supposed to work. :) But since we were not able to make fast align work automatically from Java without crashing, I believe it would be better to remove it from the config file altogether. What do you think?


Gustavo Henrique PaetzoldPh.D. Candidate in Computer ScienceUniversity of Sheffield

Date: Wed, 20 Jan 2016 02:52:52 -0800 From: notifications@github.com To: questplusplus@noreply.github.com CC: ghpaetzold@outlook.com Subject: Re: [questplusplus] Intermediate files management (#19)

I think this tools.fast_align.path is the path for the tool Fast Align.

However, if you provide a file with the alignments precomputed (using

-alignments parameter) it will never be used. On the other hand, if you do

not provide the -alignments, QuEst++ will try to generate the resource and,

in this case, will need the path provided in tools.fast_align.path.

Is it right, @ghpaetzold https://github.com/ghpaetzold?

On 20 January 2016 at 11:48, lspecia notifications@github.com wrote:

Carol explained me about the tools.fast_align.path: this is not a path for

a tool but rather to a resource that has to be pre-built - so it is fine to

have it there as is and not to try to re-generate it at every run

:-)

On 20 January 2016 at 10:47, Frederic Blain notifications@github.com

wrote:

Apologies @ghpaetzold https://github.com/ghpaetzold you're right (as

usual I would say), and I was wrong: I should write tools.fast_align.path

since we are talking about word alignments.

Reply to this email directly or view it on GitHub

<

https://github.com/ghpaetzold/questplusplus/issues/19#issuecomment-173168974

.

Lucia

www.dcs.shef.ac.uk/~lucia/

Reply to this email directly or view it on GitHub

< https://github.com/ghpaetzold/questplusplus/issues/19#issuecomment-173169222

.

Carolina Scarton

PhD Candidate and Research Assistant

Department of Computer Science

University of Sheffield

http://www.dcs.shef.ac.uk/~carolina

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/ghpaetzold/questplusplus/issues/19#issuecomment-173170430 .

Carolina Scarton PhD Candidate and Research Assistant Department of Computer Science University of Sheffield http://www.dcs.shef.ac.uk/~carolina

ghpaetzold commented 8 years ago

I have "solved" this problem by simply discarding the automatic alignment production from QuEst++. It's quite unreliable for larger input files and quite slow as well...

If in the future we resolve to add it back, we can simply uncomment some lines :)