Open esturdivant-usgs opened 6 years ago
from Ellyn:
My but I was inconsistent in my naming rules :> The images follow one "rule" and the GCPs etc another, and the 1st N characters wouldn't work for differentiating the GCPs from transects- they'd need to be the last N chars... I doubt multiple regex would be easy to get to work...
Maybe exact match or some regex as your condition?
def upload_files_matching_xml(sb, item, xml_file, max_MBsize=2000, replace=True, verbose=False):
# Upload all files matching the XML filename to SB page.
# E.g. xml_file = 'path/data_name.ext.xml' will upload all files beginning with 'data_name'
# optionally remove all present files
if replace:
# Remove all files (and facets) from child page
item = remove_all_files(sb, item, verbose)
# List all files matching XML
dataname = xml_file.split('.')[0]
dataname = dataname.split('_meta')[0]
# up_files = glob.glob(searchstr)
up_files = [fn for fn in glob.iglob(dataname + '*')
if not fn.endswith('_orig')]
bigfiles = []
for f in up_files:
if os.path.getsize(f) > max_MBsize*1000000: # convert megabytes to bytes
bigfiles.append(os.path.basename(f))
up_files.remove(f)
# Upload all files pertaining to data to child page
if verbose:
print("UPLOADING: files matching '{}'".format(os.path.basename(dataname + '*')))
if len(bigfiles)>0:
print("**TO DO** File {} is to big to upload here. Please manually upload afterward.".format(bigfiles))
item = sb.upload_files_and_upsert_item(item, up_files) # upsert should "create or update a SB item"
if verbose:
print("UPLOAD COMPLETED.")
return item, bigfiles
Simple solution: change the XML filename to match the data.
However, this would not address the problem of having multiple zip files with slightly different names that match one XML. To address this I could add the ability to search only based on a certain number of beginning characters. OR...
I could have a separate process for datasets in this category: multiple large zip files to accompany a single XML.
Change this:
dataname = xml_file.split('.')[0]
dataname = dataname.split('_meta')[0]
to this:
dataname = xml_file.split('.')[0]
dataname = dataname.split('_meta')[0] # will work on _metadata also
dataname = dataname[:number_of_chars_in_data_prefix]
Another challenge currently: if the filename is the same before either . or _meta is the same, and it just has an additional prefix, those files will be uploaded also. E.g. The page created for ubw_meta.xml (or ubw.tif.xml) will also hold ubw_test.xml.
Another possible work-around is to use a field in the metadata to specify the data filenames.
commit 14840641bd8f4de690e454d3acdf4222a32b31aa
Right now it matches the XML filename with or without a '_meta' prefix.
It might be better to let the user specify a filename convention to use to match data to XML. e.g. number of characters to match from the beginning of the filename. Alternatively, allow the user to input a regex (regular expression) string for matching.