Datafable / dolichopodidae-portugal-data-publication

Data publication and paper for the Dolichopodidae from Portugal
0 stars 0 forks source link

How is the taxon information stored in the database? #5

Open peterdesmet opened 9 years ago

peterdesmet commented 9 years ago

@marcpollet, we already have an export of the occurrences, but how is the taxon information stored in the database? Is there a separate table with one record for each species, including classification information?

marcpollet commented 9 years ago

Peter,

I have all taxonomic information from family until species in a separate Access database (SYSTDOL) linked to the EURODOL database with distribution data, with all atributes, even type locality, European distribution (basis for Fauna Europaea, version 2004), and even descriptions (over 160 characters).

So, basically, you can get the species name in any format required.

Cheers, Marc

peterdesmet commented 9 years ago

Great. Is there a link between those databases beyond the scientificName, such as an ID? What we will need are two exports:

marcpollet commented 9 years ago

Peter,

I have no idea whether Darwin Core is specimen based (many data bases are ...), but mine is not:

1) Taxonomic database: there is a species_id (number), a species_cd (usually 8 alphanumerical characters, first 4 of genus_name and 4 following, derived (not always the same) from species_name), and a species_name (full name without author and year). Links between tables are between species_cd which is the most practical.

2) Occurrence: each record is the combination of one species and one sample, with the corresponding number of males and females identified. Some samples contain hundreds of specimens of one species, and I cannot split up "my" records in hundreds of single specimen records (I do not see the use of this). A compromise could be that we do not include the specimen numbers (males/females) into the Darwin Core, or perhaps only the total number of specimens.

One more thing: the table I provided you with is, of course, a combination of information retrieved from different tables: localities, locations, sampling sites, samples, specimens (or identifications), and species. Most important are the sample and identifications tables as they represent visible, touchable items. Moreover, if a wrong sample_cd (there is also a sample_id) is used in the identifications table, then the identifications might refer to nothing or a wrong location, you see. I have a strict procedure to code samples.

Cheers, Marc