Closed NoonanM closed 6 years ago
I can make it take multiple files then merge into single file to process. Though I'm a little bit worried if there could be column mismatch among these files, then a simple merge may meet problems.
For example some file have some extra columns which are not available in other files. Merging them may create some empty columns for other data. I'm not sure if empty columns will cause problems in some calculations.
This actually should belong to the data preparation step if we decided to have that. User can preview the merge and find problems, possibly select different combinations to avoid problem.
Can your run as.telemetry
on the individual files and merge them after?
I just realized that wouldn't keep them in the same projection. as.telemetry
should be able to handle individuals with NA columns, though.
as.telemetry
can work on each individual file, and each telemetry
object is independent from each other.
The data frame/data.table used in my app need to merge all animals in one table, that requires same columns for all file, or adding empty columns for file with less columns. The app will keep the telemetry
object for all calculations need a telemetry
input, but that means the data in merged table can be different from the telemetry
object (mainly some empty columns).
Merge them into single csv before importing to as.telemetry
should at least make the data consistent across the telemetry object and the data.table. It should also keep them in same projection.
However I'm not sure if these files can be merged directly without problem, i.e. if some file has different format (like time stamp format). I think having this feature may make user think they can combine any file, and that could include files with very different format, even different column names.
It's possible for now to implement the feature with simplest case
Any inconsistence in format/columns cannot be handled automatically, and we have to ask user to solve that first. Hopefully in most cases the format are consistent.
@chfleming @NoonanM Do we have some sample data files that representative the common cases? And do we want to load multiple data files came from different animals/format?
One approach to solve the column name mismatches across the files:
as.telemetry
to import each one, merge the time, long, lat columns of each telemetry
object into a single data frame (with identity column added), then import with as.telemetry
again.as.telemetry
into a function. That way could drop optional columns that are imported by as.telemetry
, such as errors, velocities, etc.. The following will work better, but is slightly inefficient as projection happens twice:
data.frame
of all longitudes & latitudes and feed that into ctmm:::suggest.projection(data.frame)
. The output will be a projection string centered on and oriented to the data.projection(telemetry_object)<- projection_string
on the telemetry objects.@chfleming @NoonanM Do you have some sample data files that can be used to test this use case?
I'm testing with some data and found this error.
# change to downloaded files
files <- c("/Users/xhdong/Projects/ctmm-Shiny/data/buffalo/Kruger African Buffalo, GPS tracking, South Africa.csv",
"/Users/xhdong/Projects/ctmm-Shiny/data/gulls/FTZ_ Foraging in lesser black-backed gulls (data from Garthe et al. 2016).csv")
tele_list_list <- lapply(files, as.telemetry)
# drop down the level from each file, into items of animal names
tele_list <- unlist(tele_list_list, recursive = FALSE)
df_list <- lapply(tele_list,
function(tele) { tele[c("longitude", "latitude")] })
dt <- rbindlist(df_list)
proj_suggested <- ctmm:::suggest.projection(dt)
lapply(tele_list, function(tele) {
ctmm::projection(tele) <- proj_suggested
return(tele)
})
# meet error at No.10
projection(tele_list[[10]]) <- proj_suggested
Error in `[<-.data.frame`(x3, i, ..., value = value) :
replacement has 0 items, need 11928
rbindlist
or see one in ctmmweb
, so I just ran do.call
on rbind
here.projection()<-
on a list of objects would be useful, so I implemented that.NA
heading, which my code wasn't prepared for. The number of satellites was also NA
for that row, so it was like an incomplete or corrupted measurement. as.telemetry
now has an option rm.na
that determines how incomplete measurements are handled---is the row deleted or is the column deleted. The default is the row. This is a kind of device failure, so I don't know what best practice would be, but here rm.na="row"
seems to make the most sense. I also added code to make sure that some information is complete regarding the velocity vector and error ellipses.I am running a check on the code now and will push to GitHub when it finishes.
@chfleming I assume the code is already finished and in github now?
Yes, sorry.
When we are importing multiple files, the result usually is a list of telemetry objects, named by animal id.
What should we do when there are duplicated animal id from different files? The app assume all animals have unique names (this is not a problem when you import single file). Should different files of same animal id combined as a single telemetry object with data merged?
I think we at least need to give a warning message about this.
I would run the names through make.unique
OK. That's a good idea.
Is there any user case that multiple data file in different time period of same animal being uploaded?
That could happen.
In that case each file's data will generate a separate telemetry object with the animal id varied. This may not be optimal, but it's difficult to separate two cases:
I think we can only process with one assumption of these two cases, and provide some warning messages.
Alternatively we can add an option to treat name conflict with assumption A or B, if they are both quite possible.
I implemented the importing with multiple files in app.
Multiple datasets on the same individual are going to be cases where there were multiple collar deployments. They might differ in data quality, in which case their errors might need to be calibrated separately before merging, but they shouldn't overlap in time.
So even if we do merge them, it's not a simple task considering the errors.
Maybe we should put this task in data preparation step. Is the current treatment of varying names acceptable for now?
Yes. We should leave the option to merge for after uere
.
@chfleming The ctmm:::suggest.projection(data.frame)
is no longer available with newest version of ctmm? I didn't find a different name for this function.
@xhdong-umd Using low-level functions is no longer necessary as of yesterday's update. I sent an email but forgot to point that out. See the example here: https://ctmm-initiative.github.io/ctmm/reference/projection.html
So instead of previous code that taking long/lat data.frame, I just use median(buffalo,k=2)
on the list of telemetry objects with different projections to get the new projection?
The help of projection said median return median of a telemetry object, it actually also work on a list of telemetry object, right?
I met this error when importing a file:
> as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/buffalo/Kruger African Buffalo, GPS tracking, South Africa.csv.zip")
Minimum sampling interval of 3 minutes in Cilla
Minimum sampling interval of 0 seconds in Gabs
Minimum sampling interval of 2 minutes in Mvubu
Minimum sampling interval of 0 seconds in Pepper
Minimum sampling interval of 0 seconds in Queen
Minimum sampling interval of 5 minutes in Toni
Error in rbind(proj)[, c("longitude", "latitude")] :
subscript out of bounds
Yes on the first questions and I'm looking into the import bug.
Should be fixed now.
Yes I verified it and updated the app to use the new median
function instead of the low level function.
Some users might have their data contained in multiple csv files, but the app currently only allows for one file to be loaded in at a time. It would be useful to have the option to load more than on file.