NHMDenmark / Mass-Digitizer

Common repo for the DaSSCo team
Apache License 2.0
1 stars 0 forks source link

Import numbers for Pinned insects were off compared to imported directory count #458

Closed jlegind closed 10 months ago

jlegind commented 10 months ago

What is the issue ?

The count line numbers command that was run on the 3.Imported specify directory yielded a greater number than the count made in Specify UI "queries".

Detailed description of the issue.

We want the number of records in the 3.Imported specify directory to match the number we see when querying Specify. The number of Specify pinned insects with project name 'DaSSCo' is slightly lower than the count from the 3. Imported directory.

Estimate level of effort required.

difficult (and it turned out to be that)

What is the expected acceptable result.

That the count on the directory and the Specify DB track each other.

It might also be an idea to put some pseudocode if relevant.

Powershell code below produces a count for each dataset in the directory of type .tsv


 Get-ChildItem  *.tsv |
     Select-Object -Property @(
         'FullName'
          @{ Name = "LineCount"; Expression = {
              (Get-Content -Path $_.FullName | Select-Object -Skip 8 | Measure-Object).Count
          }}
     ) |
     Export-Csv output.csv -NoTypeInformation
jlegind commented 10 months ago

The numbers for Pinned insects are correct now. The issue turned out to be a single pinned insects file that was not imported into Specify. It is characteristic in that the suffix is '.csv' and not '.tsv'.