Allow working with different collars in the same time series

bgctw commented 3 months ago

Using a common sensor (and logger) that connects to different chambers is a common setup.

Currently, RespChamberProc, assumes all chunks are related to a common collar.

Provide a mapping of chunks to different collars/repIds and add the number somewhere to the plots.

bgctw commented 2 months ago

@michaelstrupler and Oscar: Can you, please, have a look at branch collarchunk? You can installed it by devtools::install_github("bgctw/RespChamberProc", ref="collarchunk") and vignette switchingChambers?

I introduced the notion of a collar which encapsulates chamber dimensions and lag-time. Different chunks can be associated with different collars. And the collar is by default printed in the facet heading of diagnostic plots together with the chunkId. Does this help your setup? You can associate your valve with the chamber variable (e.g. by specifying it as as value of argument colIndex in subsetContiguous).

Moreover, I introduced parallel fitting of different chunks by using the furrr package. You can specify the number of processors by plan(multisession, workers = 4).

michaelstrupler commented 2 months ago

Thanks, it seems to work and I have adapted the script. However, since the implementation of these features, the script [https://github.com/oscarperezpriego/ChamberProc/blob/michael/RespChamberProc/main/Chamber_PICARROv6.r] becomes ultra slow….

I have no Idea what could be the reason. Does package furr replace the package doSNOW we had used for parallel computing before?

On 13 Jun 2024, at 09:29, Thomas Wutzler @.***> wrote:

@michaelstrupler https://github.com/michaelstrupler and Oscar: Can you, please, have a look at branch collarchunk <x-msg://29/bgctw/RespChamberProc/tree/collarchunk>? You can installed it by devtools::install_github("bgctw/RespChamberProc", ref="collarchunk") and vignette switchingChambers <x-msg://29/bgctw/RespChamberProc/blob/collarchunk/vignettes/switchingChambers.md>?

I introduced the notion of a collar which encapsulates chamber dimensions and lag-time. Different chunks can be associated with different collars. And the collar is by default printed in the facet heading of diagnostic plots together with the chunkId. Does this help your setup? You can associate your valve with the chamber variable (e.g. by specifying it as as value of argument colIndex in subsetContiguous).

Moreover, I introduced parallel fitting of different chunks by using the furrr package. You can specify the number of processors by plan(multisession, workers = 4).

— Reply to this email directly, view it on GitHub https://github.com/bgctw/RespChamberProc/issues/6#issuecomment-2164824426, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3TPVI5Q6F6WME55IB2TLLZHFC5TAVCNFSM6AAAAABIYFTMOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRUHAZDINBSGY. You are receiving this because you were mentioned.

michaelstrupler commented 2 months ago

It seems that the reason the script gets slow is related to the non-equidistant timesteps somehow. If I create equidistant timesteps again, it calculates a resfit chunk in a few (i.e. < ca. 3) minutes, whereas for non-equidistant timesteps it takes around 10 times longer...

bgctw commented 2 months ago

Do you generate a fewer-records time series by creating equidistant time steps?

Maybe then better thin your records before processing (neglecting records before the next second elapsed) rather than interpolating your measurements to seconds.

If you fit concentration changes to very many records, this slowes down computation. But thinning the time series might increase the estimates of uncertainty. I implemented a thinning to nRecCentral = 20 of the central part of the concentration time series in estimating the uncertainty due to leverage of records at the edges (sigmaBootLeverage). Please, reinstall the package from the branch and try, if this modification helps your case. (devtools::install_github("bgctw/RespChamberProc", ref="collarchunk"))

bgctw commented 2 months ago

Another though: I assume that "fitting a chunk" does not include the time to produce the plot.

The modification above, of course does not decrease plotting time. If plotting is the bottleneck, you or RespChamberProc need to implement a thinning before the plotting.

bgctw commented 2 months ago

Another observation: I accidentally wrongly though that you apply calcClosedChamberFluxForChunkSpecs with a missing lagtime (tlag = NA). This is not recommended because its not robust and additionally can slow down computation especially on long time series. Better determine the lag-time by inspecting the lag-time of a few chunks and subsequently fix it (as you did, as I see now)

In your script you apply a lag-time of 16 seconds for all gases. If you want to vary this by tracegas (as I remember from the videocon), you need to use a different collar_spec per trace gas.

michaelstrupler commented 2 months ago

Actually, the time series get even longer, as the missing timesteps are interpolated, see the first 10 lines of the non-interpolated timesteps (ds_subset) and the interpolated timesteps:

head(ds_subset) DATE TIME FRAC_DAYS_SINCE_JAN1 FRAC_HRS_SINCE_JAN1 JULIAN_DAYS EPOCH_TIME

1: 2022-05-31 08:01:59.987 150.2931 3607.033 151.2931 1653976920 2: 2022-05-31 08:02:01.676 150.2931 3607.034 151.2931 1653976922 3: 2022-05-31 08:02:03.944 150.2931 3607.034 151.2931 1653976924 4: 2022-05-31 08:02:04.852 150.2931 3607.035 151.2931 1653976925 5: 2022-05-31 08:02:06.303 150.2931 3607.035 151.2931 1653976926 6: 2022-05-31 08:02:07.615 150.2931 3607.035 151.2931 1653976928 ALARM_STATUS INST_STATUS solenoid_valves N2O_dry1min CO2_dry CH4_dry H2O NH3 1: 65536 451 0.0000000 0.2947989 448.4094 1.992880 1.210567 9.941032 2: 65536 451 0.9418886 0.2947990 448.4094 1.992749 1.210567 9.941032 3: 65536 451 0.1404110 0.2928492 448.4095 1.992572 1.210567 9.941392 4: 65536 451 0.0000000 0.2912658 448.3468 1.992298 1.202615 9.938338 5: 65536 451 0.0000000 0.2895435 451.0968 1.991855 1.189846 9.933532 6: 65536 451 0.0000000 0.2883773 453.5823 1.991454 1.178309 9.928979 CHAMBER_TEMP_sync SOIL_TEMP_sync CHAMBER_PRESSURE_sync SWITCH_sync TIMESTAMP 1: 0 0 0 0 2022-05-31 08:01:59 2: 0 0 0 0 2022-05-31 08:02:01 3: 0 0 0 0 2022-05-31 08:02:03 4: 0 0 0 0 2022-05-31 08:02:04 5: 0 0 0 0 2022-05-31 08:02:06 6: 0 0 0 0 2022-05-31 08:02:07 TIMESTAMP CO2_dry CH4_dry H2Oppt NH3_dry N2O_dry collar 1 2022-05-31 08:01:59 448.4094 1.992880 12.10567 0.1006285 0.2947989 0 2 2022-05-31 08:02:00 448.4094 1.992814 12.10567 0.1006285 0.2947989 0 3 2022-05-31 08:02:01 448.4094 1.992749 12.10567 0.1006285 0.2947990 0 4 2022-05-31 08:02:02 448.4094 1.992660 12.10567 0.1006303 0.2938241 0 5 2022-05-31 08:02:03 448.4095 1.992572 12.10567 0.1006321 0.2928492 0 6 2022-05-31 08:02:04 448.3468 1.992298 12.02615 0.1005931 0.2912658 0 I also attach a MWE: MWE ====================================: # load libraries ---------------------------------------------------------- library(rlang) library(changepoint) library(nlme) library(segmented) library(tibble) library(plyr) library(dplyr) library(purrr) library(RespChamberProc) library(devtools) library(stringr) library(ggplot2) library(data.table) library(lubridate) library(tidyr) library(doSNOW) #is this still used when we use furr? library(furrr) library(elevatr) library(sf) library(openmeteo) #retrieves historical weather data from https://open-meteo.com/en/docs/historical-weather-api # Setup ------------------------------------------------------------------- # ## Set working directory setwd("/Users/ms/Research Projects/Espana/UCO_Chamber/RespChamberProc/RespChamberProc") latDeciDeg <- 37.91514875822371 #enter here latitude in Geographical Coordinates (decimal degrees); crs =4326) lonDeciDeg <- -4.72405536319233 #enter here longitude in Decimal Degrees fileName <- "JFAADS2174-20220531-080209-DataLog_User.dat" #set here file name data_pathname <- "data/" #"Data_Adrian/Agosto/19/" results_pathname <- "../" #create folder with the name of the measurement archive to save results and plots therein dir.create(paste0(results_pathname,"results")) results_dir <- paste0(results_pathname,"results/",str_sub(fileName,end=-18)) dir.create(results_dir) # fit chambers in parallel inside calcClosedChamberFluxForChunkSpecs plan(multisession, workers = 4) # Read and prepare data --------------------------------------------------- ds0 <- fread(paste0(data_pathname,fileName), sep ="auto") ds0$TIMESTAMP <- as.POSIXct(paste0(ds0$DATE," ",ds0$TIME), "%Y-%m-%d %H:%M:%S", tz= "UTC") ## The logger is accumulating data from previous field campaigns. # Here we subset data from a given field campaign. (Entrar hora del inicio de observationes y fin (convert to UTC)) ds_subset <- subset(ds0, as.numeric(TIMESTAMP) >= as.numeric(as.POSIXctUTC("2022-05-31 08:00:00")) ) ds_subset <- subset(ds_subset, as.numeric(TIMESTAMP) <= as.numeric(as.POSIXctUTC("2022-05-31 12:00:00" )) ) ds <- ds_subset # load additional functions ----------------------------------------------- source("main/functions/getAdditionalWeatherVariables.R") # load function to automatically get elevation (AWS Mapzen elevation tiles: https://registry.opendata.aws/terrain-tiles/) T and atmospheric pressure (data: https://open-meteo.com/en/docs/historical-weather-api) for a sampling location (to be entered in decimal degrees below) source("main/functions/ChunkPlots.R") # load function to generate plots for each chunk for various gases source("main/functions/convertToRegularTimesteps.R") # Correct gases ----------------------------------------------------------- Additional_Weather_Data <- getAdditionalWeatherVariables(latDeciDeg, lonDeciDeg,format(min(ds$TIMESTAMP),"%Y-%m-%d"),format(max(ds$TIMESTAMP),"%Y-%m-%d")) # For the case of PICARRO IRGA gives dry mole fractions for CO2, N2O, CH4, but not for NH3 and H2O ds$solenoid_valvesInt<- ds$solenoid_valves %>% as.integer() ds$H2Oppt <- ds$H2O*10 # H2O from PICARRO is in %.Needs to be in ppt --> We need to multiply by 10 ds$N2O_dry <- ds$N2O_dry1min ds$NH3_dry <- 10E-3*corrConcDilution(ds, colConc = "NH3", colVapour = "H2Oppt") #NH3 from PICARRO is in ppb --> multiply colVapour by 10^-3 to get ppm ds$H2O_dry <- corrConcDilution(ds, colConc = "H2Oppt", colVapour = "H2Oppt") ## (h, Pa, and AirTemp are obtained directly from freely available Meteo and Elevation data for given coordinates during the measurement time interval with the "getElevationAndAtmosphericPressure" script (see details therein). If desired, it can also be set here manually) # ds$h = 106 # Córdoba elevation (m above sea level) # ds$Pa <- 101325*(1 - 2.25577*10^-5*h)^5.25588 # (Pa) # ds$AirTemp <- 25 #(degrees Celsius) ds <- ds[!duplicated(ds$TIMESTAMP),] # extract duplicated rows # interpolate measurement timestamps for whole dataset ##create continuous timestamp vector (here: interval=1 second) regular_timesteps <- seq(min(ds$TIMESTAMP),max(ds$TIMESTAMP), by="1 sec") collar_df <- tibble("TIMESTAMP"=ds$TIMESTAMP,"collar"=ds$solenoid_valvesInt) ds <-convertToRegularTimesteps(ds,c("CO2_dry", "CH4_dry","H2Oppt", "NH3_dry","N2O_dry"),regular_timesteps) ds <- left_join(ds,collar_df,by=join_by("TIMESTAMP"=="TIMESTAMP")) %>% fill(.,collar,.direction="down") nrow(ds_subset) nrow(ds) > On 17 Jun 2024, at 15:56, Thomas Wutzler ***@***.***> wrote: > > > Do you generate a shorter time series by creating equidistant time steps? > > Maybe then better thin your records before processing (neglecting records before the next second elapsed) rather than interpolating your measurements to seconds. > > If you fit concentration changes to very many records, this slowes down computation. But thinning the time series might increase the estimates of uncertainty. > I implemented a thinning to nRecCentral = 20 of the central part of the concentration time series in estimating the uncertainty due to leverage of records at the edges (sigmaBootLeverage). > Please, reinstall the package from the branch and try, if this modification helps your case. > (devtools::install_github("bgctw/RespChamberProc", ref="collarchunk")) > > — > Reply to this email directly, view it on GitHub , or unsubscribe . > You are receiving this because you were mentioned. >

michaelstrupler commented 2 months ago

It does not seem that plotting is the bottleneck...

On 17 Jun 2024, at 16:06, Thomas Wutzler @.***> wrote:

Another though: I assume that "fitting a chunk" does not include the time to produce the plot.

The modification above, of course does not decrease plotting time. If plotting is the bottleneck, you or RespChamberProc need to implement a thinning before the plotting.

— Reply to this email directly, view it on GitHub https://github.com/bgctw/RespChamberProc/issues/6#issuecomment-2173503429, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3TPVNGHFDTCW2ZINNQNS3ZH3UQDAVCNFSM6AAAAABIYFTMOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTGUYDGNBSHE. You are receiving this because you were mentioned.

michaelstrupler commented 2 months ago

Thanks for the hint!

On 17 Jun 2024, at 16:12, Thomas Wutzler @.***> wrote:

Another observation: I see that you apply calcClosedChamberFluxForChunkSpecs with a missing lagtime (tlag = NA). This is not recommended because its not robust and additionally can slow down computation especially on long time series. Better determine the lag-time by inspecting the lag-time of a few chunks and subsequently fix it.

— Reply to this email directly, view it on GitHub https://github.com/bgctw/RespChamberProc/issues/6#issuecomment-2173534695, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3TPVIGPO3SYUCWM6OOY7TZH3VDPAVCNFSM6AAAAABIYFTMOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTGUZTINRZGU. You are receiving this because you were mentioned.

bgctw commented 2 months ago

I am not able to debug the problem, with the provided code (changing pathes, loading other scripts and data from the file system, ...).

For a minimal working example (MWE), please,

provide a small dataset (~10 chunks, i.e. concentration measurement cycles) at a download location (e.g. inside a github repository)

download and load it in your MWE code, e.g.

filename <- "example.zip"
download.file(
"https://github.com/bgctw/RespChamberProc/blob/master/inst/genData/SMANIE_Chamber1_26032015.zip?raw=TRUE",
filename,  mode = "wb")
ds <- readDat(unz(filename, filename = unzip(filename, list = TRUE)[1,"Name"] ),tz = "UTC")
head(ds)
...

try to condense your code to the bare necessary steps to replicate the problem. Loading additional packages often show that its not condensed (e.g. provide the temperature with the example dataset created by you, rather than query if from a service)
only rely on relative pathes and only load results from there that have been produced by the MWE

bgctw commented 2 months ago

Regarding the equidistant-time steps:

Producing a longer time series is problematic. The uncertainty of the estimated flux decreases with the amount of data provided. When providing more records by the interpolation, subsequent uncertainty estimates are biased low.

michaelstrupler commented 2 months ago

Thanks for your comments, please find my answers below:

On 20 Jun 2024, at 13:35, Thomas Wutzler @.***> wrote:

I am not able to debug the problem, with the provided code (changing pathes, loading other scripts and data from the file system, ...).

For a minimal working example (MWE), please,

provide a small dataset (~10 chunks, i.e. concentration measurement cycles) at a download location (e.g. inside a github repository https://github.com/bgctw/RespChamberProc/blob/master/inst/genData/SMANIE_Chamber1_26032015.zip) download and load it in your MWE code, e.g. filename <- "example.zip" download.file( "https://github.com/bgctw/RespChamberProc/blob/master/inst/genData/SMANIE_Chamber1_26032015.zip?raw=TRUE", filename, mode = "wb") ds <- readDat(unz(filename, filename = unzip(filename, list = TRUE)[1,"Name"] ),tz = "UTC") head(ds) … I have created a first version of a the Package “ChamberProc” that uses “RespChamberProc” as dependency:

devtools::install_github("michaelstrupler/ChamberProc”)

the sample data is now included in the package, you can address it with “sample_PICARRO_data”.

try to condense your code to the bare necessary steps to replicate the problem. Loading additional packages often show that its not condensed (e.g. provide the temperature with the example dataset created by you, rather than query if from a service) The dataset I got from a colleague (and all the other datasets they have) do not contain these data, therefore we sample these additional values from a service.

only rely on relative pathes and only load results from there that have been produced by the MWE ok

I send you the MWE as .R file per mail. Best, Michael

— Reply to this email directly, view it on GitHub https://github.com/bgctw/RespChamberProc/issues/6#issuecomment-2180456933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3TPVLJRCPYFE45ZRXVP5TZIK47BAVCNFSM6AAAAABIYFTMOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQGQ2TMOJTGM. You are receiving this because you were mentioned.

bgctw commented 2 months ago

I installed your package and tried running the Code you sent by email (which is not quite minimal) I get the error on the weather line:

> Additional_Weather_Data <- getAdditionalWeatherVariables(latDeciDeg, lonDeciDeg,format(min(ds$TIMESTAMP),"%Y-%m-%d"),format(max(ds$TIMESTAMP),"%Y-%m-%d"))
Error in st_as_sf(sampling_location, coords = c("lon", "lat"), crs = 4326) : 
  could not find function "st_as_sf"

Maybe, you need to adjust package imports. I figured out, that the function works after I explicitly load the sf package.

First, I create a dataset from inside your script (but without adjusting timesteps) so that we can discuss. ds_mwe1 <- select(dsChunk, iChunk, collar, TIMESTAMP, CO2_dry, AirTemp, Pa)

put it to github and use it from an self-contained MWE:

library(RespChamberProc)
library(dplyr)
ds_mwe1 <- readRDS(url("https://github.com/bgctw/RespChamberProc/raw/collarchunk/develop/issue_cases/issue6_mwe1.rds?raw=TRUE"),"rb")
df <- filter(ds_mwe1, iChunk==4)
plot(df$CO2_dry ~ df$TIMESTAMP)  # strongly autocorrelated
resFit <- calcClosedChamberFlux(df
                                , fRegress = c(lin = regressFluxLinear, tanh = regressFluxTanh, exp = regressFluxExp, poly= regressFluxSquare)
                                , debugInfo = list(omitEstimateLeverage = TRUE) # faster
                                , colConc = "CO2_dry", colTime = "TIMESTAMP"    # colum names conc ~ timeInSeconds
                                , colTemp = "AirTemp", colPressure = "Pa"       #Temp in degC, Pressure in Pa
                                , volume = 0.4*0.4*0.4, area = 0.4*0.4
                                , minTLag = 60,  maxLag = 120
                                , concSensitivity = 0.01
)

# thin data
df2 <- slice(df, seq(1, nrow(df), 10))
plot(df2$CO2_dry ~ df2$TIMESTAMP) # still some autocorrelation
resFit2 <- calcClosedChamberFlux(df2
                                , fRegress = c(lin = regressFluxLinear, tanh = regressFluxTanh, exp = regressFluxExp, poly= regressFluxSquare)
                                , debugInfo = list(omitEstimateLeverage = TRUE) # faster
                                , colConc = "CO2_dry", colTime = "TIMESTAMP"    # colum names conc ~ timeInSeconds
                                , colTemp = "AirTemp", colPressure = "Pa"       #Temp in degC, Pressure in Pa
                                , volume = 0.4*0.4*0.4, area = 0.4*0.4
                                , minTLag = 60,  maxLag = 120
                                , concSensitivity = 0.01
)
rbind(resFit, resFit2)

When I plot the CO2 time series of your first chunk, I see a large autocorrelation. Nonlinear fitting such a time series, especially combined with break-point detection is going to be hard and to take longer.

After thinning the time series to take only each 10th point, there is still some of the autocorrelation. This means we did not throw away much information.

The fitting gets a lot faster. And the results are very similar.

   flux fluxMedian sdFlux  tLag lagIndex autoCorr    AIC sdFluxRegression sdFluxLeverage iFRegress sdResid iqrResid    r2 times       model 
  <dbl>      <dbl>  <dbl> <dbl>    <int>    <dbl>  <dbl>            <dbl>          <dbl>     <dbl>   <dbl>    <dbl> <dbl> <list>      <list>
1 0.204         NA 0.0354    61       50    0.981 -857.            0.0354             NA         2   0.380    0.466 0.951 <dbl [380]> <gnls>
2 0.233         NA 0.0287    62        6    0.909   40.0           0.0287             NA         2   0.381    0.526 0.951 <dbl [38]>  <gnls>

Note, that I could not find any connection of the performance related to equidistant time-steps - that you recommended to speed up.

I do not want to implement an automatic thinning of time series in RespChamberProc, but let the user decide. For your case I recommend this thinning (8 compared to 10 before to be conservative). ds <- ds_mwe1 %>% group_by(iChunk) %>% slice(seq(1, n(), 8)) %>% ungroup()

bgctw commented 2 months ago

I ran the above code again without the debugInfo = list(omitEstimateLeverage = TRUE) and got

# A tibble: 2 × 15
   flux fluxMedian sdFlux  tLag lagIndex autoCorr    AIC sdFluxRegression sdFluxLeverage iFRegress sdResid iqrResid    r2 times       model 
  <dbl>      <dbl>  <dbl> <dbl>    <int>    <dbl>  <dbl>            <dbl>          <dbl>     <dbl>   <dbl>    <dbl> <dbl> <list>      <list>
1 0.204      0.229 0.0354    61       50    0.981 -857.            0.0354         0.0188         2   0.380    0.466 0.951 <dbl [380]> <gnls>
2 0.233      0.234 0.0460    62        6    0.909   40.0           0.0287         0.0460         2   0.381    0.526 0.951 <dbl [38]>  <gnls>

With using the full autocorrelated series, the estimate of the leverage fails and the flux estimate is strongly influenced by the very negative concentration wiggle at the start after the time lag. Omitting just a few records (as implemented) does not omit the first wiggle of high leverage, which induces a bias in the estimated flux. This can be seen by the large difference between original flux and median of the leverage-bootstrapped flux (fluxMedian), despite a very low uncertainty (sdFlux).

Hence, it is not only a performance but a larger issue. Critically examine the results of fitting high-autocorrelation time series.

michaelstrupler commented 2 months ago

On 24 Jun 2024, at 10:59, Thomas Wutzler @.***> wrote:

I installed your package and tried running the Code you sent by email (which is not quite minimal) I get the error on the weather line:

Additional_Weather_Data <- getAdditionalWeatherVariables(latDeciDeg, lonDeciDeg,format(min(ds$TIMESTAMP),"%Y-%m-%d"),format(max(ds$TIMESTAMP),"%Y-%m-%d")) Error in st_as_sf(sampling_location, coords = c("lon", "lat"), crs = 4326) : could not find function "st_as_sf" Maybe, you need to adjust package imports. I figured out, that the function works after I explicitly load the sf package.

Thanks for the hint, I have added the package “sf” to “imports” in the DESCRIPTION file. So this should work now :)

First, I create a dataset from inside your script (but without adjusting timesteps) so that we can discuss. ds_mwe1 <- select(dsChunk, iChunk, collar, TIMESTAMP, CO2_dry, AirTemp, Pa)

put it to github and use it from an self-contained MWE:

library(RespChamberProc) library(dplyr) ds_mwe1 <- readRDS(url("https://github.com/bgctw/RespChamberProc/raw/collarchunk/develop/issue_cases/issue6_mwe1.rds?raw=TRUE"),"rb") df <- filter(ds_mwe1, iChunk==4) plot(df$CO2_dry ~ df$TIMESTAMP) # strongly autocorrelated resFit <- calcClosedChamberFlux(df , fRegress = c(lin = regressFluxLinear, tanh = regressFluxTanh, exp = regressFluxExp, poly= regressFluxSquare) , debugInfo = list(omitEstimateLeverage = TRUE) # faster , colConc = "CO2_dry", colTime = "TIMESTAMP" # colum names conc ~ timeInSeconds , colTemp = "AirTemp", colPressure = "Pa" #Temp in degC, Pressure in Pa , volume = 0.40.40.4, area = 0.4*0.4 , minTLag = 60, maxLag = 120 , concSensitivity = 0.01 )

thin data

df2 <- slice(df, seq(1, nrow(df), 10)) plot(df2$CO2_dry ~ df2$TIMESTAMP) # still some autocorrelation resFit2 <- calcClosedChamberFlux(df2 , fRegress = c(lin = regressFluxLinear, tanh = regressFluxTanh, exp = regressFluxExp, poly= regressFluxSquare) , debugInfo = list(omitEstimateLeverage = TRUE) # faster , colConc = "CO2_dry", colTime = "TIMESTAMP" # colum names conc ~ timeInSeconds , colTemp = "AirTemp", colPressure = "Pa" #Temp in degC, Pressure in Pa , volume = 0.40.40.4, area = 0.4*0.4 , minTLag = 60, maxLag = 120 , concSensitivity = 0.01 ) rbind(resFit, resFit2) When I plot the CO2 time series of your first chunk, I see a large autocorrelation. image.png (view on web) https://github.com/bgctw/RespChamberProc/assets/23358311/2c9f13ba-af16-4719-80a5-4957b1efe2db Nonlinear fitting such a time series, especially combined with break-point detection is going to be hard and to take longer.

After thinning the time series to take only each 10th point, there is still some of the autocorrelation. This means we did not throw away much information.

The fitting gets a lot faster. And the results are very similar.

flux fluxMedian sdFlux tLag lagIndex autoCorr AIC sdFluxRegression sdFluxLeverage iFRegress sdResid iqrResid r2 times model
1 0.204 NA 0.0354 61 50 0.981 -857. 0.0354 NA 2 0.380 0.466 0.951 2 0.233 NA 0.0287 62 6 0.909 40.0 0.0287 NA 2 0.381 0.526 0.951

Ok, I see, the thinning seems to make a lot of sense in therms of efficiency. however when I use the thinning for the whole dataset that was provided to me, i.e. applying the thinning via "dsChunk %>% group_by(iChunk) %>% slice(seq(1, n(), 8)) %>% ungroup()” to sample_Picarro_data (that is included in the Package), one has to carefully select the tLag window in order for the code to provide no error. Especially for the H2O data that seems to be of very bad quality.

Note, that I could not find any connection of the performance related to equidistant time-steps - that you recommended to speed up.

I do not want to implement an automatic thinning of time series in RespChamberProc, but let the user decide. For your case I recommend this thinning (8 compared to 10 before to be conservative). ds <- ds_mwe1 %>% group_by(iChunk) %>% slice(seq(1, n(), 8)) %>% ungroup()

I agree that it is the best option to let the user decide about the thinning (and the selected parameters). I will create a vignette (my first ever;) for the package adapted for the PICARRO system where I will recommend thinning the data if sample points are that dense.

— Reply to this email directly, view it on GitHub https://github.com/bgctw/RespChamberProc/issues/6#issuecomment-2185974398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3TPVNPAOBNYVQVLDZLVXDZI7NXTAVCNFSM6AAAAABIYFTMOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVHE3TIMZZHA. You are receiving this because you were mentioned.

bgctw / RespChamberProc

Allow working with different collars in the same time series #6

thin data