Closed ots22 closed 1 year ago
@ots22 sorry for the delay on this, yes that's exactly what observed. And thanks - I somehow missed that regex is required in that field!
Initially I just tried changing .
to \\.
as it was the only regex-critical I thought was in there but that still failed in the same way.
After copying the regex from the test with 10943 (ie: using filename_regexp=".*115048\\.mrc"
) the test succeeds instead. So looks like I'm just bad at regex!
I suspect a lot of the intended/potential userbase aren't regex experts though, so I'd like to float an idea:
filename_regexp
would be interpreted as regexpfilename_regexp
would be interpreted as a string (like you would provide to glob.glob
to get a list of filepaths satisfying the string/wildcard conditions)If possible, I believe this would make empiarreader more accessible to those new to coding/python. Also, it might be nice to throw a more easily interpretable error upon this failure, but I'm unsure of the optimal way to implement that. Let me know your thoughts!
Ah good spot. Apparently the full matching filename is https://ftp.ebi.ac.uk/empiar/world_availability/10340/data/Micrographs/Case1/FoilHole_21756216_Data_21768957_21768958_20181221_2054-115048.mrc
, which is why the .*
is needed. Full path could be useful for matching across subdirectories, but the common case is probably just the basename.
Agreed about having the option to have a mode for string matching (or simple wildcards), and the error message. I think read_partition(0)
might also be a bit too low-level for typical user facing code anyway, but a full read()
is expensive.
See #20.
Accessing EMPIAR entry 10340 with EmpiarSource appears not to load any entries.
Raises
StopIteration
followed byIndexError
(indicating the datasource is empty).(The
.
in the regular expression is escaped as\\.
so it isn't interpreted as 'any character' - but makes no difference to the result)@JatGreer is this what you observed?