ctmm-initiative / ctmmweb

Web app for analyzing animal tracking data, built upon ctmm R package
http://biology.umd.edu/movement.html
GNU General Public License v3.0
32 stars 21 forks source link

variable number of data columns #29

Closed chfleming closed 7 years ago

chfleming commented 7 years ago

I just fixed a bug in as.telemetry for data with different information in them (some with HDOP others without). Previously as.telemetry would accidentally delete all of the individuals without HDOP. Now as.telemetry only deletes the missing HDOP column from individuals without HDOP values.

But I'm almost certain this now does not work in ctmm-webapp. The error message is below:

Warning: Error in rbindlist: Item 2 has 8 columns, inconsistent with item 1 which has 9 columns. If instead you need to fill missing columns, use set argument 'fill' to TRUE.
Stack trace (innermost first):
    77: rbindlist
    76: tele_list_to_dt [helpers.R#240]
    75: merge_animals [helpers.R#254]
    74: update_input_data [C:\Users\alub0001\AppData\Local\Temp\RtmpEZJsSE\shinyapp19f425af55e3\ctmm-webapp-master/server.R#28]
    73: data_import [C:\Users\alub0001\AppData\Local\Temp\RtmpEZJsSE\shinyapp19f425af55e3\ctmm-webapp-master/server.R#52]
    72: file_uploaded [C:\Users\alub0001\AppData\Local\Temp\RtmpEZJsSE\shinyapp19f425af55e3\ctmm-webapp-master/server.R#56]
    71: observeEventHandler [C:\Users\alub0001\AppData\Local\Temp\RtmpEZJsSE\shinyapp19f425af55e3\ctmm-webapp-master/server.R#62]
     7: runApp
     6: runUrl
     5: shiny::runGitHub
     4: eval [https://raw.githubusercontent.com/ctmm-initiative/ctmm-webapp/master/run.R#7]
     3: eval
     2: withVisible
     1: source
xhdong-umd commented 7 years ago

So the individuals with HDOP will have one extra column? The webapp merge all individuals into one data frame so the columns need to be same. I can make the individuals without HDOP to have a HDOP column of NAs. Will that NA HDOP column cause problem?

Can you point some example data so I can test it?

chfleming commented 7 years ago

In this case its HDOP, but in other cases it will be something else.

I don't think the current ctmm functions would well handle HDOP=NA. HDOP=1 would work fine as a stop-gap solution for HDOP specifically.

xhdong-umd commented 7 years ago

The rbindlist fill = TRUE argument will fill any missing column to NA by default. If NA can be interpreted properly there is no need to change the code for any future change.

For now I will add some code to manually set HDOP to 1.

xhdong-umd commented 7 years ago

@chfleming Can you send/share me some data to test on this?

chfleming commented 7 years ago

I don't have the specific data that triggered this because its highly restricted, but it would be like a CSV with two animals, one with numbers for HDOP and the other with NA for HDOP values.

At the moment, I have everything in ctmm coded up to do all of the cleaning in as.telemetry and subsequent analysis functions assume nothing bad like NA. I may have to reconsider at some point.

xhdong-umd commented 7 years ago

Is the column name just HDOP?

chfleming commented 7 years ago

This is an array of the possible values that I have seen in CSV files (taken from as.telemetry code):

c("GPS.HDOP","HDOP","DOP")

with differing case and . can also be _.

MoveBank format is not so restrictive.

xhdong-umd commented 7 years ago

Do you want to standardize the column name after import? So the import process still recognize all the variations, but will rename it for all later processes. Otherwise you need to check these names every time when needed.

chfleming commented 7 years ago

In as.telemetry I import the columns that ctmm can currently use and standardize them. HDOP values become HDOP unless an UERE value is specified, in which case the two are combined into an HERE column. In the near future, if HDOP is missing and number of satellites is present then an approximate HDOP column will be made.

xhdong-umd commented 7 years ago

So after import the column name will be just HDOP, this is all I need.

When the columns are combined into HERE, will a HERE = 1 work for individuals without this column?

chfleming commented 7 years ago

Unfortunately an HERE=1 column would be bad. HDOP is proportional error. HERE is absolute (calibrated) error.

xhdong-umd commented 7 years ago

Will HERE=0 work?

chfleming commented 7 years ago

No, I only have an HERE column when calibrated errors are specified. HERE=0 would correspond to measurably zero error in the data.

xhdong-umd commented 7 years ago

So if one individual have HERE, another don't have it, there is no way to align them with same columns?

The only solution I can think of is to set HERE=NA, and check NA in related code of ctmm.

chfleming commented 7 years ago

I see a couple of options, from easiest to hardest:

  1. You could remove NA columns from individuals before sending them to ctmm functions with a single wrapper function.
  2. Various ctmm functions could start looking for NA in the columns they try to pull. Right now everything I have coded assumed that all data was cleaned in or before as.telemetry.
  3. Restructure ctmm-webapp to store individuals not all in one data.frame type object, but in a list of data.frames that can have different number of columns.
xhdong-umd commented 7 years ago

A wrapper function is easy, but there are a lot places ctmm functions are involved, some are called by other functions so not obvious at first.

The fundamental difference lies in ctmm treat each individual separately, maybe in a list. However to use ggplot with all the individuals I need to merge them into one data frame, and all my code were based on this data frame since it's also easier to select a subset dynamically.

To change this one data.frame structure into a list of data.frames will need very substantial rewrite, also make a lot of things much harder -- a lot of things can be done with 1-2 lines data.table now may need a lot of list manipulations.

I'll think more about the wrapper function approach.

xhdong-umd commented 7 years ago

I found I have maintained a telemetry obj list all the time, so I only need to ensure that list don't have NA columns. I have been very careful to make sure the list always sync to the data frame, which sometimes are quite cumbersome.

Looks like I only need to add fill=True. So the data frame will have NA cols but the telemetry objects didn't change.

@chfleming can you test if it work?

xhdong-umd commented 7 years ago

@chfleming I made some simulated data and tested in app. It worked until the home range calculation:

Warning: Error in rbind: numbers of columns of arguments do not match
Stack trace (innermost first):
    110: rbind
    109: rbind
    108: do.call
    107: akde.list
    106: akde
    105: eval
    104: eval
    103: withProgress
    102: <reactive:selected_hrange_list> [/Users/xhdong/Projects/ctmm-Shiny/shiny/app/server.R#1450]

It seemed that akde cannot deal with a list of telemetry objects with different columns.

akde(select_models()$tele_list,
                             CTMM = select_models()$models)
xhdong-umd commented 7 years ago

Another question about the data: I assume it to be a csv that include all individuals, so the csv should have a HDOP column, but some individuals don't have value for that column? I created a modified buffalo data file in this format, with Cilla and Gabs have HDOP value. However as.telemetry only kept the HDOP column for Cilla.

chfleming commented 7 years ago

@NoonanM is this the bug you found recently? It arose from some changes where as.telemetry was dropping rows/individuals with missing columns and now it isn't. I'm going to fix it today.

@xhdong-umd The Movebank CSV may or may not have an HDOP column. You can email me (or link to) the fake data and I will take a look at that. HDOP should only be dropped when its NA valued.

xhdong-umd commented 7 years ago

@chfleming To save a data set with variable columns in Movebank CSV, we have to have a HDOP column in header, and write NA or empty values for individuals without HDOP, right? CSV cannot handle variable columns in same file.

I attached the csv file here. varied_buffao.csv.zip

chfleming commented 7 years ago

Fixes for those two issues are up on GitHub.

xhdong-umd commented 7 years ago

@chfleming I installed the latest version but still got same error on rbind. Could it be this line?

chfleming commented 7 years ago

I used the wrong variable name in one line. Its tested to work on a case that failed before recompiling.

xhdong-umd commented 7 years ago

I verified and it seemed to be working now.