corliber / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

non-existent files specified for FilesCollectionReader #83

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
There are two params in the PTCR that allow you to specify by name the
files you want processed.  The behavior works in a very similar way to the
how the suffixes param works.  That is, an Iterable<File> is created by
Files using the value provided by PARAM_FILE_OR_DIRECTORY.  The iterable is
iterated over and files that satisfy a filter based on the values provided
by e.g. PARAM_FILE_NAMES are processed.  This has the effect that file
names that appear in the param value for PARAM_FILE_NAMES but do not exist
on disk will be ignored.  I don't know if this is good behavior or bad.  It
seems that you might want PTCR to throw an exception if a file you specify
does not exist.  I can imagine other scenarios where ignoring non-existent
files is fine. 

If the current behavior is fine, then we just simply need to document what
the behavior is.  If both scenarios are valid, then we need a param that
fails when a non-existent file is specified.  

I don't care one way or another.  I am filing this issue because it seems
like a place where unexpected behavior might occur.

Original issue reported on code.google.com by pvogren@gmail.com on 6 Apr 2009 at 10:57

GoogleCodeExporter commented 8 years ago

Original comment by pvogren@gmail.com on 6 Apr 2009 at 10:57

GoogleCodeExporter commented 8 years ago
I definitely think there should be an exception if I list a file and it isn't 
found
on disk.

Original comment by steven.b...@gmail.com on 7 Apr 2009 at 2:41

GoogleCodeExporter commented 8 years ago
changed this ticket to reflect name change of PlainTextCollectionReader to
FilesCollectionReader

Original comment by pvogren@gmail.com on 15 Apr 2009 at 11:49

GoogleCodeExporter commented 8 years ago
I was looking at FilesCollectionReader and thinking that I really don't want to 
bother with throwing an exception if a file listed in PARAM_FILE_NAMES is not 
seen.  It is documented so that the user knows that the values are meant as 
allowed values.  The notion that you have a list of file names that you want to 
run through is a different paradigm that what is used here (where we are 
filtering the files in a directory) and probably deserves its own collection 
reader.  

If you feel strongly that the behavior should change, then please reopen this 
issue.

Original comment by pvogren@gmail.com on 9 Jan 2011 at 4:23