Entity and Attribute section incorrectly reads CSV files

DOI-USGS / fort-pymdwizard

The MetadataWizard is a useful tool designed to facilitate FGDC metadata creation for spatial and non-spatial data sets. It is a cross-platform desktop application built using an open-source Python architecture.

https://usgs.github.io/fort-pymdwizard/

Other

63 stars 22 forks source link

Entity and Attribute section incorrectly reads CSV files #118

Closed mcannister-usgs closed 2 years ago

mcannister-usgs commented 4 years ago

I've attached a couple CSVs that I tried to read into the tool today. The first few column labels are read correctly but not all columns are listed. The values described under the columns do not line up with the values from the original files. Screenshot below loads headers from first 4 columns but loads data from the final 4 columns in the dataset.

LDWF Data.zip

ColinTalbert commented 4 years ago

We currently interpret a # symbol anywhere in a csv as a comment, such that any info after the pound will get ignored. This is clearly not always the appropriate response.

Ultimately it would be nice to expose this and (some of) the other 50 ish parameters that Pandas.read_csv exposes to handle the variability possible in the CSV format. Largely we just take the defaults. This is where you would change that functionality in the code: https://github.com/usgs/fort-pymdwizard/blob/master/pymdwizard/core/data_io.py#L102 but you would need to implement some form of UI for users to enter these parameters, probably on the current settings form.

ColinTalbert commented 4 years ago

Currently, your only option would be to make a copy of the csv with the pound replaced. Run the wizard against that version and swap out the character you replaced it with in the EA section.

dignizio-usgs commented 2 years ago

Closing ticket with notes from Colin's comment above. Reported issue is caused by interpretation of # sign in a file. Will note issue with @ennsk and @tnorkin to be aware of.