Closed pmreyes2 closed 3 years ago
Can we instead implement more rigorous file grabbing? The fundamental problem causing #89 is that the code is not actually only grabbing JSON files.
More specifically, this line is the problem: https://github.com/EarthCubeInGeo/resen/blob/432c2c3f1a4b8b944691c617b802c23af36635e0/resen/Resen.py#L866
One solution is to use regex
to prune the output file list of the os.listdir
. Something like ([^.]*.json)
should work, making sure that the whole string is matched.
@asreimer I agree. So only .json or .JSON files will be considered? or we use something like .upper() to compare only capital letters.
Which is why glob is the right answer....
On 7/13/21 2:07 PM, Ashton Reimer wrote:
@.**** commented on this pull request.
In resen/Resen.py https://github.com/EarthCubeInGeo/resen/pull/90#discussion_r669111310:
@@ -864,11 +864,17 @@ def __get_valid_cores(self):
for each JSON file in core directory, read in list of cores
cores = [] for fn in os.listdir(core_dir): +
- if not fn.lower().endswith('json'):
This allows hidden files (those with a |.| in front) to be picked up. I don't agree that this should be acceptable. I think we can test for this with |startswith('.')|.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EarthCubeInGeo/resen/pull/90#pullrequestreview-705648744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMC5ZVLKWPAXBGEPFRSJO3TXSTJNANCNFSM5AJV3OLA.
The latest commit changes the code to use glob
as a way to search for resen-core images specification files.
I think the PR code is really hard to read and figure out what is going on:
for fn in glob.glob1(core_dir,"*.[jJ][sS][oO][nN]"):
try:
with open(os.path.join(core_dir,fn),'r') as f:
cores.extend(json.load(f))
except json.decoder.JSONDecodeError:
print('WARNING: {} is not a valid JSON file! Skipping this file.'.format(fn))
except:
print('WARNING: {} can not be read. Skipping this file.'.format(fn))
There are a number of things wrong with the PR for this one. First off, there is no glob1() function in the glob module. The glob() function takes a single argument which is the pattern (not a path and filename). The pattern is a simple shell-style wildcard match, not a regular expression so the convoluted [jJ][sS]... pattern wouldn't work anyways.
I still think that the best approach is the specify that the configuration files we are looking for simply have the lower case .json extension. That is the normal convention for Unix systems and Windows filesystems are case insensitive so it doesn't matter there.
The other problem is that the open() call adds the core_dir path back in, but will already be there in the output of the glob call(). The default mode for open is 'r', so no need to specify (again, with the goal if reducing the number of symbols you need to read when looking at the code).
I would also reduce the except's to a single catch all that just notifies us that the load failed. I'm also a fan of using formatted string literals to make the code easier to read.
So the cleaned up version of the code would look like:
for filename in glob.glob(os.path.join(core_dir, '*.json')):
try:
with open(filename) as f:
cores.extend(json.load(f))
except:
print('WARNING: Problem reading {filename}. Skipping.')
If you really wanted to read both .json and .JSON file patterns, then build them up as separate lists:
json_files = glob.glob(os.path.join(core_dir, '*.json'))
JSON_files = glob.glob(os.path.join(core_dir, '*.JSON'))
for filename in [*json_files, *JSON_files]:
...
@valentic I've followed your suggestions and I agree that those modifications make the code easier to read.
There is a difference in the warning message though. This time the full path of the file is given, in all the previous versions of this code, only the file base name was part of the warning message. I prefer this version that prints the full path since it gives more information to the user on where to find the offending file.
BTW, glob.glob1
does exist but it returns only filenames without directory. I was using it before in order not to touch the open
code that was joining the file with the directory. I am using now glob.glob
that outputs the full path.
glob1 is not part of the public interface for the glob module. See the python docs or help(glob). Writing code that uses it can potentially break because it is a private implementation detail. Its use in other python modules has been considered a bug:
https://bugs.python.org/issue16620
On 7/14/21 12:45 PM, Pablo M. Reyes wrote:
@valentic https://github.com/valentic I've followed your suggestions and I agree that those modifications make the code easier to read. There is a difference in the warning message though. This time the full path of the file is given, in all the previous versions of this code, only the file base name was part of the warning message. I prefer this version that prints the full path since it gives more information to the user on where to find the offending file. BTW, |glob.glob1| does exist but it returns only filenames without directory. I was using it before in order not to touch the |open| code that was joining the file with the directory. I am using now |glob.glob| that outputs the full path.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EarthCubeInGeo/resen/pull/90#issuecomment-880161187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMC5ZTBTK4CQPBA4B7DGTTTXXSNBANCNFSM5AJV3OLA.
I believe this is the final iteration of fixing this code.
This adds a general exception to handle possible file problems in the files used for specifying
resen-core
images inside folder ~/.config/resen/cores. The issue is reported in #78 and in #89 As an example, one of those problems arose when a file was opened with thevim
editor which creates a swap file with non-utf-8 characters in it which causedresen
to crash at the start. With this fix, a warning is issued about the file that couldn't be read andresen
start with no problems.