Open mwdunlap2004 opened 2 weeks ago
Rob I was thinking of doing both steps in the same workflow, but I think once we have the process completed it could easily be separated. This is my take on our step to create the daily precipitation data in dh_timeseriesweather, it isn't working right now, as it isn't inserting any data, I'm thinking this is related to the varid, in this step, we simultaneously need to make sure our varkey is the hourly data, and that we change it to the daily varid, how do we do that? I did check and there is a hydroid for nldas2_precip_daily, its 1464, I'm just unsure on how to handle us having two different values in this step. I don't think calc_raster has that step, they join on the hydroid, but they don't update it in that step.
varkey=$1 \set varkey '$varkey' \n end_epoch = $2
How to make our epoch
select extract(epoch from ('2021-05-01'::timestamp));
insert into tmp_dh_timeseries_weather ( tstime,tsendtime, featureid, entity_type, rast, bbox, varid)
select min(a.tstime),max(a.tsendtime), a.featureid, a.entity_type, ST_Union(a.rast, 'SUM') as rast, a.bbox, b.hydroid as varid
from dh_timeseries_weather as a
left outer join tmp_dh_timeseries_weather as dupe
on (
a.tstime = dupe.tstime
and a.tsendtime = dupe.tsendtime
and a.varid = dupe.varid
and a.bbox = dupe.bbox
)
left outer join dh_variabledefinition as b
on(
a.varid = b.hydroid AND
b.varkey = 'varkey'
)
where dupe.tid is null
AND b.varkey = 'nldas2_obs_hourly'
AND extract(date from to_timestamp(a.tsendtime)) = extract(date from to_timestamp(end_epoch))
GROUP BY a.featureid, a.entity_type, a.bbox, b.hydroid,
extract(day from to_timestamp(a.tsendtime)),
extract(year from to_timestamp(a.tsendtime)),
extract(month from to_timestamp(a.tsendtime));
@mwdunlap2004 I think there are a few issues here with your psql variables. First, I'm a little confused by these lines:
varkey=$1
\set varkey '$varkey' \n
end_epoch = $2
Are arguments $1
and $2
coming in from the command line? If so, I'd suggest declaring these first and then using \set
(basically, switch lines 2 and 3). For best practice, let's be more specific in our declaration of varkey too. We should try to use the brackets notation with string interpolation, though it's not really necessary here (and thus we are not always consistent):
varkey=$1
end_epoch = $2
\set varkey '${varkey}' \n
One reason your query is not working, is the following:
left outer join dh_variabledefinition as b
on(
a.varid = b.hydroid AND
b.varkey = 'varkey'
)
This is literally joining on varkey = 'varkey'
, so it's looking for all entries where varkey
is equal to the word 'varkey'. So not what we want. Instead, you need to call your SQL variable using the colon notation:
left outer join dh_variabledefinition as b
on(
a.varid = b.hydroid AND
b.varkey = :'varkey'
)
As for setting the daily varkey, look at what you are sending to INSERT
. Rather than inserting b.hydroid
, send a string variable that you need instead e.g. 'dailyvarkeyID' as varid
insert into tmp_dh_timeseries_weather ( tstime,tsendtime, featureid, entity_type, rast, bbox, varid)
select min(a.tstime),max(a.tsendtime), a.featureid, a.entity_type, ST_Union(a.rast, 'SUM') as rast, a.bbox, b.hydroid as varid
@COBrogan great observations and points of style. I would add:
\set
command is part of a quoted string variable raster_sql
that is assembled in the script then sent to the database via the line that starts echo $raster_sql | psql...
The lines like varkey=$2
are NOT part of that SQL string variable, and should all be placed at the top of the script before you begin to declare the raster_sql
variable.
@mwdunlap2004 per your question about the other varkey/varid you will simply need to change the script to expect a second varkey for the daily variable that you are inserting.
I think you definitely need to separate the creation of the daily total and the hourly fraction into separate scripts. This will simplify debugging, increase readability and also allow us to utilize the hourly-to-daily separately. If you don't separate them you will now need to pass in 3 varkeys
as arguments: the hourly data varkey
, the hourly fraction varkey
and the daily varkey
-- another reason that splitting simplifies things.
Based on your suggestions here is my change to the code to insert our daily rasters into dh_timeseries_weather, however I do have a few questions about how this is working. First in this example we set dailyvarkeyID as varid, does this mean we'd want this to be a number '1464' in this case, as opposed to the character string I had used for our fraction raster step previously. What I mean is that in the past we had used a left join to update the records in our table using this new hydroid based on the dh_variabledefintion table and a new varkey.
Second, I am working under the assumption that we would want our epoch to be used as an argument into this function, but would we rather create it when we set the variables? This would mean the argument is the date and then I could do something like:
\set epoch '${extract(epoch from ('date'::timestamp))}' \n
I'm not sure if this would work, but that brings me to my third question, which is how do I declare these variables in my environment? I tried this to set our date, but I'm not sure how to check if it worked, or how to call my variable once I've done this.
Do $$ Declare data_date DATE; BEGIN data_date = '2021-05-01';
END $$;
hourlyvarkeyID=$1 dailyvarkeyID=$2 end_epoch = $3
rastersql="
\\set hourlyvarkeyID '${hourlyvarkeyID}' \n
\\set dailyvarkeyID '${dailyvarkeyID}' \n
insert into tmp_dh_timeseries_weather ( tstime,tsendtime, featureid, entity_type, rast, bbox, varid)
select min(a.tstime),max(a.tsendtime), a.featureid, a.entity_type, ST_Union(a.rast, 'SUM') as rast, a.bbox, :'dailyvarkeyID' as varid
from dh_timeseries_weather as a
left outer join tmp_dh_timeseries_weather as dupe
on (
a.tstime = dupe.tstime
and a.tsendtime = dupe.tsendtime
and a.varid = dupe.varid
and a.bbox = dupe.bbox
)
left outer join dh_variabledefinition as b
on(
a.varid = b.hydroid
)
where dupe.tid is null
AND b.varkey =: 'hourlyvarkeyID'
AND extract(day from to_timestamp(a.tsendtime)) = extract(day from to_timestamp(end_epoch))
AND extract(year from to_timestamp(a.tsendtime)) = extract(year from to_timestamp(end_epoch))
AND extract(month from to_timestamp(a.tsendtime)) = extract(month from to_timestamp(end_epoch))
GROUP BY a.featureid, a.entity_type, a.bbox, b.hydroid,
extract(day from to_timestamp(a.tsendtime)),
extract(year from to_timestamp(a.tsendtime)),
extract(month from to_timestamp(a.tsendtime));"
This is my updated code, it uses start and end_date and converts them into epoch, or sets them to -1 if we don't give them a date, it uses our defined hourly and varkeyID,uses two left outer joins it also does the time matching based on our epoch. However, how do I run this code to see if it works? Because I tried to set the variables in SQL, and it said that ' \ ' was an invalid command, is it because this is really a bash command not an SQL one?
hourlyvarkeyID=$1
dailyvarkeyID=$2
start_date =-1
end_date =-1
if [ $# -gt 2 ]; then
start_date=$3
fi
if [ $# -gt 3 ]; then
end_date=$4
fi
if [ "$start_date" == "-1" ]; then
start_epoch='-1'
else
start_epoch=$(date -d "$start_date" +%s)
fi
if [ "$end_date" == "-1" ]; then
end_epoch='-1' # Or leave it undefined
else
end_epoch=$(date -d "$end_date" +%s)
fi
rastersql="
\\set hourlyvarkeyID '$hourlyvarkeyID' \n
\\set dailyvarkeyID '$dailyvarkeyID' \n
\\set start_epoch $start_epoch
\\set end_epoch $end_epoch
insert into tmp_dh_timeseries_weather ( tstime,tsendtime, featureid, entity_type, rast, bbox, varid)
select min(data.tstime),max(data.tsendtime), data.featureid, data.entity_type, ST_Union(data.rast, 'SUM') as rast, data.bbox, daily.hydroid as varid
from dh_timeseries_weather as data
left outer join tmp_dh_timeseries_weather as dupe
on (
data.tstime = dupe.tstime
and data.tsendtime = dupe.tsendtime
and data.varid = dupe.varid
and data.bbox = dupe.bbox
)
left outer join dh_variabledefinition as hour
on(
data.varid = hour.hydroid
)
left outer join dh_variabledefinition as daily
on(
daily.varkey = :'dailyvarkeyID'
)
where dupe.tid is null
AND hour.varkey = :'hourlyvarkeyID'
and ( (data.tstime >= $start_epoch) or ($start_epoch = -1) )
and ( (data.tsendtime <= $end_epoch) or ($end_epoch = -1) )
GROUP BY data.featureid, data.entity_type, data.bbox, daily.hydroid,
extract(day from to_timestamp(data.tsendtime)),
extract(year from to_timestamp(data.tsendtime)),
extract(month from to_timestamp(data.tsendtime));"
Here is my attempt at making the delete statement: It takes in similar requests, using the varkeyID and the start and end dates to create our epochs and then deleting the rows in our data that match those values, it also has the default of -1 to delete all the rows, so that if you want all the data for the daily table, it will delete all the rows.
dailyvarkeyID=$1
start_date=-1
end_date=-1
if [ $# -gt 1 ]; then
start_date=$2
fi
if [ $# -gt 2 ]; then
end_date=$3
fi
if [ "$start_date" == "-1" ]; then
start_epoch=-1
else
start_epoch=$(date -d "$start_date" +%s)
fi
if [ "$end_date" == "-1" ]; then
end_epoch=-1
else
end_epoch=$(date -d "$end_date" +%s)
fi
deletedaily="
\\set dailyvarkeyID '$dailyvarkeyID'
\\set start_epoch $start_epoch
\\set end_epoch $end_epoch
delete from tmp_dh_timeseries_weather as data
using dh_variabledefinition as variables
where data.varid = variables.hydroid
and variables.varkey = :'dailyvarkeyID'
and ( (data.tstime >= :start_epoch) or (:start_epoch = -1) )
and ( (data.tsendtime <= :end_epoch) or (:end_epoch = -1) )
;"
@mwdunlap2004 Not sure if this answers your question about set
but you only need one slash when running the command manually e.g.
\set testVar
select :testVar
However, when you are writing the Bash command, the backslash has to be escaped so that you need \\set
and each has to be on its own line (and thus needs to end with \n
). Notice the \n
at the end of your first set
. In your INSERT
query, your start_epoch
and end_epoch
need to be called using the colon notation in your WHERE
statement e.g.
and ( (data.tstime >= :start_epoch) or (:start_epoch = -1) )
and ( (data.tsendtime <= :end_epoch) or (:end_epoch = -1) )
Are you running this line-by-line, or writing the query to file via echo
/cat
? e.g.
# turn off the expansion of the asterisk
set -f
echo -e $raster_sql > $raster_sql_file
cat $raster_sql_file | psql -h $db_host $db_name
@COBrogan @rburghol So more along the lines of this? I also don't have quotes around the epoch's right now, is that something I need because I assigned them? I was trying to follow the formatting of calc_raster_ts, but is that the best way to test to see if it works? or is there a different method I can use to run the function? Because if I follow the calc_raster_ts method don't we have to make a config file for it?
hourlyvarkeyID=$1
dailyvarkeyID=$2
start_date =-1
end_date =-1
if [ $# -gt 2 ]; then
start_date=$3
fi
if [ $# -gt 3 ]; then
end_date=$4
fi
if [ "$start_date" == "-1" ]; then
start_epoch='-1'
else
start_epoch=$(date -d "$start_date" +%s)
fi
if [ "$end_date" == "-1" ]; then
end_epoch='-1' # Or leave it undefined
else
end_epoch=$(date -d "$end_date" +%s)
fi
rastersql="
\\set hourlyvarkeyID '$hourlyvarkeyID' \n
\\set dailyvarkeyID '$dailyvarkeyID' \n
\\set start_epoch $start_epoch
\\set end_epoch $end_epoch
insert into tmp_dh_timeseries_weather ( tstime,tsendtime, featureid, entity_type, rast, bbox, varid)
select min(data.tstime),max(data.tsendtime), data.featureid, data.entity_type, ST_Union(data.rast, 'SUM') as rast, data.bbox, daily.hydroid as varid
from dh_timeseries_weather as data
left outer join tmp_dh_timeseries_weather as dupe
on (
data.tstime = dupe.tstime
and data.tsendtime = dupe.tsendtime
and data.varid = dupe.varid
and data.bbox = dupe.bbox
)
left outer join dh_variabledefinition as hour
on(
data.varid = hour.hydroid
)
left outer join dh_variabledefinition as daily
on(
daily.varkey = :'dailyvarkeyID'
)
where dupe.tid is null
AND hour.varkey = :'hourlyvarkeyID'
and ( (data.tstime >=:start_epoch) or (:start_epoch = -1) )
and ( (data.tsendtime <=:end_epoch) or (:end_epoch = -1) )
GROUP BY data.featureid, data.entity_type, data.bbox, daily.hydroid,
extract(day from to_timestamp(data.tsendtime)),
extract(year from to_timestamp(data.tsendtime)),
extract(month from to_timestamp(data.tsendtime));"
# turn off the expansion of the asterisk
set -f
echo -e $raster_sql > $raster_sql_file
cat $raster_sql_file | psql -h $db_host $db_name
Here is my updated sql with the hardset values, it doesn't work if it is a single day, I'm not sure why that would be, but when I choose two days it works, and creates 2 values. By saying less than or equal I would have thought it would return if it was a singular day. My thought is that when we do this we'll need hours included, and it will have to end with like '2021-05-02 01:00:00' or something along those lines. Here is what it's returning:
INSERT 0 2
insert into tmp_dh_timeseries_weather ( tstime,tsendtime, featureid, entity_type, rast, bbox, varid)
select min(data.tstime),max(data.tsendtime), data.featureid, data.entity_type, ST_Union(data.rast, 'SUM') as rast, data.bbox, daily.hydroid as varid
from dh_timeseries_weather as data
left outer join tmp_dh_timeseries_weather as dupe
on (
data.tstime = dupe.tstime
and data.tsendtime = dupe.tsendtime
and data.varid = dupe.varid
and data.bbox = dupe.bbox
)
left outer join dh_variabledefinition as hour
on(
data.varid = hour.hydroid
)
left outer join dh_variabledefinition as daily
on(
daily.varkey = 'nldas2_precip_daily'
)
where dupe.tid is null
AND hour.varkey = 'nldas2_obs_hourly'
and (data.tstime >= extract(epoch from ('2021-05-01'::timestamp)))
and (data.tsendtime <= extract(epoch from ('2021-05-02'::timestamp)))
GROUP BY data.featureid, data.entity_type, data.bbox, daily.hydroid,
extract(day from to_timestamp(data.tsendtime)),
extract(year from to_timestamp(data.tsendtime)),
extract(month from to_timestamp(data.tsendtime));
Check method to change this workflow into daily data, without temp tables, that inserts into dh_timerseries_weather using methods from calc_raster_ts and timestamp methods
We want to take the work from #86 and convert it into something more similar to raster_calc:
tsendtime
as the input, along with thevarkey
, setting the variables ahead of timedh_timeseries_weather
varkey
as an input do? Unless we have two separate ones so that if its changed in the future, it doesn't effect the code.nldas2_obs_hourly
, 2) the destination (daily data)nldas2_precip_daily
.Our Delete function for daily_precip
Our Insert function for daily_precip