Closed AReyH closed 4 months ago
For each stage (i.e. hatchery/river, estuary, microtroll, adult), what are the number of tags deployed to date or over a specific time frame?
SELECT stage,
COUNT(DISTINCT(tag_id_long))
FROM hatch_tag
WHERE stage IS NOT NULL
AND date_time_release BETWEEN '2018-01-01' -- Change this for initial date
AND '2024-01-01' -- Change this for after date
GROUP BY stage
Edited query based on @brahmwg 's feedback
SELECT CASE
WHEN h.stage = 'NA' THEN 'Hatchery'
ELSE h.stage
END,
'hatchery' as tag_status,
COUNT(DISTINCT(h.tag_id_long)) AS tag_count
FROM hatch_tag h
WHERE h.tag_id_long IS NOT NULL
AND date_time_release > '2021-01-01'
GROUP BY h.stage
UNION ALL
SELECT 'Field' as stage,
COALESCE(CASE
WHEN f.tag_status = 'no tag' THEN 'notag'
ELSE f.tag_status END, 'NA') AS tag_status,
COUNT(DISTINCT(f.tag_id_long)) AS tag_count
FROM field f
WHERE f.tag_id_long IS NOT NULL
AND f.tag_status NOT IN ('recap', 'mort', 'reject')
AND date > '2021-01-01'
GROUP BY f.tag_status
UNION ALL
SELECT 'Microtroll' as stage,
COALESCE(m.tag_status,'NA'),
COUNT(DISTINCT(m.tag_id_long)) AS tag_count
FROM microtroll m
WHERE m.tag_id_long IS NOT NULL
AND m.tag_status NOT IN ('recap', 'mort', 'reject')
AND date > '2021-01-01'
GROUP BY m.tag_status
For the microtroll
table there is a 0
value for tag_status
. How should we handle this value?
This query may also need to consider the number of tags deployed in field
and in microtroll
. PIT tagging fish can occur at hatcheries, or in the wild during field work (at beaches, etc) or microtrolling. Be aware of the tag_status
column, which may indicate whether a particular fish was tagged, recaptured a previously tagged fish, etc.
this is what I have so far for: For each system/watershed, list all tags deployed that were subsequently redetected on the lowest mainstem array or in the estuary immediately after outmigration (to look at freshwater survival rather than overall survival and adult returns).
SELECT DETECTIONS.*
FROM DETECTIONS
LEFT JOIN LOCATION ON DETECTIONS.location = LOCATION.location_code
LEFT JOIN HATCH_TAG ON DETECTIONS.tagid = HATCH_TAG.tag_id_long
WHERE LOCATION.site_description LIKE 'Mainstem Array'
AND HATCH_TAG.system = LOCATION.watershed
AND LOCATION.subloc LIKE 'ds'
For each species in each system, list the numbers of hatchery and wild tags going out and the timing of those detections (to be able to compare hatchery and wild outmigrations).
number_release
within the Hatch_Param
table.)number_tagged
within the Hatch_Param
table.)SELECT hatch_param.species,
hatch_param.system,
hatch_param.tagging_date,
SUM(hatch_param.number_tagged) AS hatchery_tagged,
SUM(hatch_param.number_released) AS outmigration
FROM hatch_param
GROUP BY
hatch_param.species,
hatch_param.system,
hatch_param.tagging_date;
Modified as below after @brahmwg's comment:
SELECT hatch_tag.species,
hatch_tag.system,
STRING_AGG(field.date::TEXT, ', ') AS field_detection,
STRING_AGG(microtroll.date::TEXT, ', ') AS microtroll_detection,
COUNT(DISTINCT hatch_tag.tag_id_long) AS hatchery_count,
COUNT(DISTINCT field.tag_id_long) AS field_count,
COUNT(DISTINCT microtroll.tag_id_long) AS microtroll_count
FROM hatch_tag
LEFT JOIN field ON hatch_tag.tag_id_long = field.tag_id_long
LEFT JOIN microtroll ON hatch_tag.tag_id_long = microtroll.tag_id_long
GROUP BY hatch_tag.species,
hatch_tag.system;
For each species in each system, list the numbers of hatchery and wild tags going out and the timing of those detections (to be able to compare hatchery and wild outmigrations).
Uncertainties:
- How to count wild tags going out? (For now, I used
number_release
within theHatch_Param
table.)- How to count hatcheries? (For now, I used
number_tagged
within theHatch_Param
table.)- What should be the time frame of time of detection?
Wild tags should include tagging events in field
and microtroll
. You could imagine that tagging events take place at hatcheries (hatch_tag
), in the marine environment (microtroll
) and in between (in-river, beach/estuarine, etc. in field
)
Fish tagged at hatcheries are 100% hatchery-origin. Fish tagged in rivers, beach seines, or microtrolling can be hatchery or wild origin. Some, but not all hatchery fish are clipped so you can tell by looking at them (adipose fin removed) if they are a hatchery fish or not. So if a fish is caught in river, seine, or microtroll and has clip_status
= clip, then it is hatchery-origin. However, you could still have a fish with clip-stats = unclip that is also of hatchery-origin, it just wasn't clipped at the hatchery. This is where we need to link to our genetics results. If that individual was identified through PBT analysis, then it was a hatchery-origin fish. PBT or Parentage-Based-Tagging is something only done in hatcheries where they take the genetic information of the parent fish when pairing them to fertilize the females eggs. If the individual was identified to stock using GSI (Genetic Stock Information), then it was likely a wild fish.
Just to clarify; So in hatch_tag
and hatch_param
, system
= hatchery name, but in the field
and microtroll
tables, system
= river system that the fish is from (and could be the hatchery name if it turns out to be a hatchery fish or could be a wild fish)
Also, be aware of the tag_status
column, which indicates if a fish that was caught was tagged, recaptured a previously tagged fish, mort (died after tagging) or rejected the tag. Not every row in field
or microtroll
is necessarily a tagging event, it could be a recap or other.
For each system/watershed, list all tags deployed that were subsequently redetected on the lowest mainstem array or in the estuary immediately after outmigration (to look at freshwater survival rather than overall survival and adult returns).
I have yet included "estuary immediately after outmigration" to filter through date to ensure is not return, what should the window be? Can I use the same approach to determine fresh water instead of overall survival?
Lowest mainstem array can be found by looking at the lowest number location.location_code
for a particular watershed
. Also helpful to look at the site_description
. I have also quickly made a map displaying the locations of the antennas, cleaning tables, etc. The map can be seen here: https://marinescience.info/superset/explore/p/JNb87V0mMK1/
Note the legend: har = harvest, pre = predator, tag = tagging location, det = antenna detection location.
In field
, tag_status
= 'recap' where the method
= 'beach seine' would indicate that they survived the freshwater and made it to the estuarine environment.
Regarding the time window to determine if it is an outmigrant, our researchers have made R code with this decision rule in it. The R code is found in documentation/sample_code/detections. Alternatively, an arbitrary time would be at least 100 days from tagging date to detection.
This query may also need to consider the number of tags deployed in
field
and inmicrotroll
. PIT tagging fish can occur at hatcheries, or in the wild during field work (at beaches, etc) or microtrolling. Be aware of thetag_status
column, which may indicate whether a particular fish was tagged, recaptured a previously tagged fish, etc.
@brahmwg Thanks for the prompt reply! I have a few questions about the query:
hatch_tag
where the Stage
is microtroll, and the fish present in the table microtroll
?tag_status
column for this specific query? Are you guys interested only in a specific status for this query?
- What is the difference between the fish tagged in the table
hatch_tag
where theStage
is microtroll, and the fish present in the tablemicrotroll
?- What status should we be aware of in the
tag_status
column for this specific query? Are you guys interested only in a specific status for this query?
This table has now been updated. The stage is mostly null since they are all tagged at hatcheries in hatch_tag. To identify the stage you should look at viable tags deployed in hatcheries (hatch_tag
), in estuaries (field
where method
= 'beach seine', and in marine environment (microtroll
).
For tag_status
, should just be aware of 'recap', 'mort', 'reject' which should not be counted towards total tags deployed.
WITH table_1 AS(
SELECT tag_id_long, watershed, species, fork_length_mm, date,
MAX(date) OVER (PARTITION BY tag_id_long) AS last_date
FROM field
)
SELECT *, last_date - date AS no_of_days
FROM table_1
WHERE last_date - date > 100
This is for query 5 (For each species in each system/watershed, list the fork lengths of the fish tagged and identify which of those fish were detected as returning adults (compare size structure of initial and returning cohorts).)
This only returns 3 rows (could be because we have a lot of null values in tag_id_long)
Questions:
My progress:
@riyaeliza123 To improve this query, you'll want to join with the detections
table. The idea being to see the date fish are tagged in field
and the date they are detected in detections
. You'll be able to see the fork length at the tagging date in field
as well. You'll also want to look at the location where the tagging event occurred as well as the detection location.
WITH table_1 AS (
SELECT field.tag_id_long, field.watershed, field.species, field.fork_length_mm, field.date, det.last_date
FROM field
INNER JOIN (
SELECT DISTINCT d.tagid, MAX(DATE(d.datetime)) OVER (PARTITION BY d.tagid) AS last_date
FROM detections d
) det ON field.tag_id_long = det.tagid
)
SELECT *, last_date - date AS no_of_days
FROM table_1
WHERE last_date - date > 100
ORDER BY no_of_days DESC
-- 1397 rows returned
This is my latest query, looks like this is what we are looking for. https://marinescience.info/sqllab/?savedQueryId=45
Upon confirmation, i'll push this to the queries.md file
Currently, SQL query for Task 1 looks like this:
For each species in each system, list the numbers of hatchery and wild tags going out and the timing of those detections (to be able to compare hatchery and wild outmigrations).
SELECT hatch_tag.species,
hatch_tag.system,
COUNT(DISTINCT(hatch_tag.tag_id_long)) AS hatch_count,
COUNT(DISTINCT(field.tag_id_long)) AS field_count
FROM hatch_tag
FULL OUTER JOIN field ON CAST(hatch_tag.date_time_release AS DATE) = field.date
WHERE CAST(hatch_tag.date_time_release AS TIMESTAMP) > TIMESTAMP '2020-12-31'
GROUP BY hatch_tag.species, hatch_tag.system, field.date;
During our discussion yesterday, it was mentioned (If I remember correctly!) to use tag_status = 'clip'
from the field
table. However, if we add WHERE
condition based on that, no data is returned. Also, based on this query, only system
shown in the result is cowichan
.
Link to Query: https://marinescience.info/sqllab/?savedQueryId=46
For each system each year, look up what proportion of the total tags out were detected on seal haul outs or in heron rookeries (to be able to easily pull out predator data and calculate predation rates).
SELECT
main.system,
main.pred_count,
main.total_entries,
main.pred_count::INTEGER / main.total_entries::INTEGER AS proportion_over_count
FROM (
SELECT
hatch_tag.system,
SUM(CASE WHEN predator.predator IS NOT NULL THEN 1 ELSE 0 END) AS pred_count,
COUNT(*) AS total_entries
FROM
hatch_tag
LEFT JOIN
predator ON hatch_tag.tag_id_long = predator.tag_id_long
WHERE
predator.predator IN ('heron', 'sea haul') OR predator.predator IS NULL
GROUP BY
hatch_tag.system
) AS main;
This is for Task 6. It's returning all 0's for proportion_over_count
, which I think is an issue coming from division. It works fine when I tried changing division to multiplication. We can figure this out together.
This is my latest query, looks like this is what we are looking for. https://marinescience.info/sqllab/?savedQueryId=45
Upon confirmation, i'll push this to the queries.md file
Well done @riyaeliza123 ! I believe this is correct. I will run it by Sam & Jamieson to see if there's anything to modify, but it looks good to me.
Currently, SQL query for Task 1 looks like this:
For each species in each system, list the numbers of hatchery and wild tags going out and the timing of those detections (to be able to compare hatchery and wild outmigrations).
During our discussion yesterday, it was mentioned (If I remember correctly!) to use
tag_status = 'clip'
from thefield
table. However, if we addWHERE
condition based on that, no data is returned. Also, based on this query, onlysystem
shown in the result iscowichan
.Link to Query: https://marinescience.info/sqllab/?savedQueryId=46
@jlee2843 So you'll actually want to look at clip_status
= 'clip' [as opposed to tag_status
]to see if the fish was from a hatchery or wild. (fish that are clipped are from a hatchery, while unclipped may be wild or may be hatchery {this would require further exploration with the genetic data where available})
So this query will be a bit more involved; You may want to consider using views or CTEs for this. You can create a view based on a query (using the down arrow and CREATE VIEW next to the RUN button) and refer to it like any other table in other queries. You could accomplish the same with subqueries or CTEs though.
You'll want to look at hatch_tag
counts (grouped by species
and system
) and also join those tag_id_long
in hatch_tag
with the tagids in detections
. The idea is to see tags deployed at hatcheries which are detected later and where they are detected. This could all be a query and the result set used as a view to be included in a later part of this larger query.
You'll want to do a similar process for the field table, but it gets trickier because of the clip_status
. So you'll want to look at tags deployed (tag_status
= 'tag') and make a count grouped by clip_status
, to give some preliminary insight into the number of hatchery (clip_status
= 'clip') and [possibly] wild (clip_status
= 'unclip') tags deployed in field
. Similar to the hatch_tag
table, you'll want to join this with the detections
table to see which tagids are detected and when/where. This may also be a query that feeds into a view, which can be used in a later query to provide an overall summary.
After those are done, you may want to write another query with summarizes the output of the above queries in one. This is a challenging query so please feel free to ask me for help. I suggest using views as a way to break the problem down into smaller components, but using views is not strictly necessary and you may be able to solve it in other ways.
For each system each year, look up what proportion of the total tags out were detected on seal haul outs or in heron rookeries (to be able to easily pull out predator data and calculate predation rates).
@jlee2843 I would start by running a query to determine the total tags out for each system
and year
. So this will involve looking at the hatch_tag
, field
, & microtroll
tables and getting a count grouped by system
& year
where the tag_status
= 'tag'. This may also be a view to be referred to by later queries.
Once you know the total tags out for each system and year, you can compare that with the tags detected at heron rookeries and seal haul outs. This is made a bit easier by the predator
column in the predator
table, which indicates if a particular detection was found at a heron rookery or seal haul out. A simple query provides a count of total tag detections at each;
select count(*), predator from predator group by predator
The tricky part will be relating these to the appropriate system
and year
.
year
could be extracted from the date
column in predator
.
But for system
, you could group by location_name
in predator
, and then determine which system
these location_name
s belong to, but that may need some advice from our biologists since the location_name
s don't join perfectly with the location
table. Let me know when you get to this point and we can help solve that part of the query.
Hello @brahmwg, I am working on "For each system/watershed, list all tags deployed that were subsequently redetected on the lowest mainstem array or in the estuary immediately after outmigration (to look at freshwater survival rather than overall survival and adult returns)." This is what I have so far; my thinking process is to have 4 subqueries, consider two tagging tables and two detecting/recapturing tables. I have a question on incorporating the greater than 100 days threshold for outmigration- for those who's first tag being at hatchery, there is no exact date data for me to calculate for 100 days in the HATCH_TAG
table. With such cases- how should I design the first two subqueries to fulfill such requirement?
My query is below, the only part missing now is the location data to determine for down stream and method
= beach seine
as you mentioned above.
WITH
HATCH_TAG_FIELD AS (
SELECT DISTINCT HATCH_TAG.tag_id_long
FROM HATCH_TAG
INNER JOIN FIELD ON HATCH_TAG.tag_id_long = FIELD.tag_id_long
WHERE FIELD.tag_status = 'recap'
),
HATCH_TAG_DETECTIONS AS (
SELECT DISTINCT HATCH_TAG.tag_id_long
FROM HATCH_TAG
INNER JOIN DETECTIONS ON HATCH_TAG.tag_id_long = DETECTIONS.tagid
WHERE HATCH_TAG.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_FIELD)
),
FIELD_FIELD_DETECTIONS AS (
SELECT DISTINCT FIELD1.tag_id_long
FROM FIELD AS FIELD1
INNER JOIN FIELD AS FIELD2 ON FIELD1.tag_id_long = FIELD2.tag_id_long
WHERE FIELD1.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_FIELD)
AND FIELD1.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_DETECTIONS)
AND FIELD1.tag_status = 'tag'
AND FIELD2.tag_status = 'recap'
AND DATE(FIELD2.date) - DATE(FIELD1.date) > 100
),
FIELD_DETECTIONS AS (
SELECT DISTINCT FIELD.tag_id_long
FROM FIELD
INNER JOIN DETECTIONS ON FIELD.tag_id_long = DETECTIONS.tagid
WHERE FIELD.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_FIELD)
AND FIELD.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_DETECTIONS)
AND FIELD.tag_id_long NOT IN (SELECT tag_id_long FROM FIELD_FIELD_DETECTIONS)
AND FIELD.tag_status = 'tag'
AND DATE(DETECTIONS.datetime) - DATE(FIELD.date) > 100
)
SELECT * FROM HATCH_TAG_FIELD
UNION
SELECT * FROM HATCH_TAG_DETECTIONS
UNION
SELECT * FROM FIELD_FIELD_DETECTIONS
UNION
SELECT * FROM FIELD_DETECTIONS;
Hi @rafecchang , I will take a closer look a bit later and update my comment. At the moment,
I have a question on incorporating the greater than 100 days threshold for outmigration- for those who's first tag being at hatchery, there is no exact date data for me to calculate for 100 days in the HATCH_TAG table. With such cases- how should I design the first two subqueries to fulfill such requirement?
You could use the date_time_release
column from the hatch_tag
table for this. There are ~7k results in hatch_tag
with non-null values for date_time_release
. Alternatively, you will need to join with hatch_param
based on the system
, cohort
& outmigration_y
to find the tagging_date
from hatch_param
for the related rows in hatch_tag
.
My query is below, the only part missing now is the location data to determine for down stream and method = beach seine as you mentioned above.
As you've pointed out, the subloc
column from the location
table indicates if a particular antenna array is upstream (us
) or downstream (ds
).
@brahmwg Hello Brahm, I'm trying to break down Query 1 component by component.
You'll want to look at hatch_tag counts (grouped by species and system) and also join those tag_id_long in hatch_tag with the tagids in detections. The idea is to see tags deployed at hatcheries which are detected later and where they are detected. This could all be a query and the result set used as a view to be included in a later part of this larger query.
SELECT h.species, h.system, f.clip_status, h.tag_id_long, COUNT(h.tag_id_long) AS hatchery_deployed FROM hatch_tag h JOIN field f ON h.tag_id_long = f.tag_id_long WHERE f.tag_status = 'tag' AND h.clip_status = 'C' AND f.clip_status = 'unclip' GROUP BY f.clip_status, h.species, h.system, h.tag_id_long
The query above is used to make a VIEW called
hatceryField3
. This confuses me a bit since there is no entry inclip_status
ofhatchery_tag
marked as 'clip'. All entries are either 'C' or NULL. Hence I usedh.clip_status = 'C'
in the query above, but doing this only returns one row.You'll want to do a similar process for the field table, but it gets trickier because of the clip_status. So you'll want to look at tags deployed (tag_status = 'tag') and make a count grouped by clip_status, to give some preliminary insight into the number of hatchery (clip_status = 'clip') and [possibly] wild (clip_status = 'unclip') tags deployed in field. Similar to the hatch_tag table, you'll want to join this with the detections table to see which tagids are detected and when/where. This may also be a query that feeds into a view, which can be used in a later query to provide an overall summary.
SELECT h.species, h.system, h.tag_id_long, COUNT(h.tag_id_long) FROM hatch_tag h JOIN detections d ON h.tag_id_long = d.tagid GROUP BY h.species, h.system, h.tag_id_long
This will be used in creating another VIEW called
hatcheryDetections
.
I don't think I am understanding what the expected output is for the Query 1. In our previous conversation, there was also a mention of looking into dates that are beyond 2021, but is that not required anymore? Also there was a mention of using cohorts as well, but I am unsure how that can be tied to the queries above. It would be a great help to me if you can specify how you would want the final query output to be like.
@brahmwg Here is the revised Query 6, addressing your feedbacks from the comment above!
WITH predator_counts AS (
SELECT EXTRACT(YEAR FROM predator.date) AS year,
SUM(CASE WHEN predator.predator = 'heron' THEN 1 ELSE 0 END) AS heron_count,
SUM(CASE WHEN predator.predator = 'sea houl' THEN 1 ELSE 0 END) AS sea_houl_count,
location.site_name AS location
FROM predator
JOIN location ON predator.location_name = location.site_name
GROUP BY year, location.site_name
),
hatch_tag_count AS (
SELECT h.outmigration_y, h.system, COUNT(h.tag_id_long) AS tag_count
FROM hatch_tag h
LEFT JOIN field f ON h.tag_id_long = f.tag_id_long
LEFT JOIN microtroll m ON h.tag_id_long = m.tag_id_long
GROUP BY h.outmigration_y, h.system
)
SELECT year, heron_count, sea_houl_count
FROM predator_counts
JOIN hatch_tag_count ON predator_counts.location = hatch_tag_count.system
WHERE year > 2020;
I believe that JOIN location ON predator.location_name = location.site_name
this part still needs confirmation. As of now, no output is returned. Saved query can be found here.
Hello @brahmwg here is the query I have now:
WITH
HATCH_TAG_FIELD AS (
SELECT DISTINCT HATCH_TAG.tag_id_long, HATCH_TAG.system
FROM HATCH_TAG
INNER JOIN FIELD ON HATCH_TAG.tag_id_long = FIELD.tag_id_long
LEFT JOIN LOCATION AS LOCATION1 ON LOWER(HATCH_TAG.system) = LOWER(LOCATION1.watershed)
LEFT JOIN LOCATION AS LOCATION2 ON LOWER(FIELD.site) = LOWER(LOCATION2.site_description)
WHERE FIELD.tag_status = 'recap'
AND LOCATION2.location_code IN ('3c', '3d', '3e', '3f', '2d', '41', '7e')
AND LOCATION1.watershed = LOCATION2.watershed
AND DATE(FIELD.date) - DATE(HATCH_TAG.date_time_release) > 100
),
HATCH_TAG_DETECTIONS AS (
SELECT DISTINCT HATCH_TAG.tag_id_long, HATCH_TAG.system
FROM HATCH_TAG
INNER JOIN DETECTIONS ON HATCH_TAG.tag_id_long = DETECTIONS.tagid
LEFT JOIN LOCATION AS LOCATION1 ON LOWER(HATCH_TAG.system) = LOWER(LOCATION1.watershed)
LEFT JOIN LOCATION AS LOCATION2 ON LOWER(DETECTIONS.location) = LOWER(LOCATION2.location_code)
WHERE HATCH_TAG.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_FIELD)
AND LOCATION2.subloc = 'ds'
AND DATE(DETECTIONS.datetime) - DATE(HATCH_TAG.date_time_release) > 100
),
FIELD_FIELD_DETECTIONS AS (
SELECT DISTINCT FIELD1.tag_id_long, FIELD1.watershed
FROM FIELD AS FIELD1
INNER JOIN FIELD AS FIELD2 ON FIELD1.tag_id_long = FIELD2.tag_id_long
LEFT JOIN LOCATION AS LOCATION1 ON LOWER(FIELD1.site) = LOWER(LOCATION1.site_description)
LEFT JOIN LOCATION AS LOCATION2 ON LOWER(FIELD2.site) = LOWER(LOCATION2.site_description)
WHERE FIELD1.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_FIELD)
AND FIELD1.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_DETECTIONS)
AND FIELD1.tag_status = 'tag'
AND FIELD2.tag_status = 'recap'
AND DATE(FIELD2.date) - DATE(FIELD1.date) > 100
AND (
(LOCATION1.location_code = '39' AND LOCATION2.location_code IN ('3c', '3d', '3f'))
OR (LOCATION1.location_code IN ('2c', '2f', '2e') AND LOCATION2.location_code = '2d')
OR (LOCATION1.location_code IN ('43', '42') AND LOCATION2.location_code = '41')
OR (LOCATION1.location_code IN ('79', '78', '7f') AND LOCATION2.location_code = '7e')
)
),
FIELD_DETECTIONS AS (
SELECT DISTINCT FIELD.tag_id_long, FIELD.watershed
FROM FIELD
INNER JOIN DETECTIONS ON FIELD.tag_id_long = DETECTIONS.tagid
LEFT JOIN LOCATION ON DETECTIONS.location = LOCATION.location_code
WHERE FIELD.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_FIELD)
AND FIELD.tag_id_long NOT IN (SELECT tag_id_long FROM HATCH_TAG_DETECTIONS)
AND FIELD.tag_id_long NOT IN (SELECT tag_id_long FROM FIELD_FIELD_DETECTIONS)
AND FIELD.tag_status = 'tag'
AND DATE(DETECTIONS.datetime) - DATE(FIELD.date) > 100
AND LOCATION.subloc = 'ds'
)
SELECT * FROM HATCH_TAG_FIELD
UNION
SELECT * FROM HATCH_TAG_DETECTIONS
UNION
SELECT * FROM FIELD_FIELD_DETECTIONS
UNION
SELECT * FROM FIELD_DETECTIONS;
edit: I consider the relative geographical locations by looking at the map for field. For detection, I used 'ds' only
@jlee2843 I will try my best to address all of your questions here, and I will come back to update my comment as I review your query;
This confuses me a bit since there is no entry in clip_status of hatchery_tag marked as 'clip'.
You can assume all tags deployed at hatcheries have clip_status
= 'clip'. In other tables, when a fish is caught and the researcher notices that it has it's fin clipped, that is an indicator that the fish originated from a hatchery.
The original query (query 1) in english:
For each species in each system, list the numbers of hatchery and wild tags going out and the timing of those detections (to be able to compare hatchery and wild outmigrations).
Starting with the first section "For each species in each system, list the numbers of hatchery and wild tags going out", we would need to use an aggregate function like COUNT(*)
and GROUP BY
species
& system
. The query below would produce those summaries for the hatch_tag
table;
-- This query produces a summary of counts of tags deployed by system and species for hatcheries
SELECT COUNT(*), system, species
from hatch_tag
group by system, species
For the wild tags, we can do a similar summary for the field
table. Notably, we would want to only include tag_status
= 'tag' (so we exclude recaps, morts, rejects) and to filter by clip_status
= 'unclip' to exclude fish that are clipped (and therefore originated from a hatchery).
-- This query produces summary counts of tags deployed by watershed & species for wild tags in the field table
SELECT count(*), watershed, species
from field
where tag_status = 'tag'
and clip_status= 'unclip'
group by watershed, species
The next part of the original query question "and the timing of those detections (to be able to compare hatchery and wild outmigrations)" will require looking at the individual tags and not the aggregations. For this you'll want to join the tags deployed at hatcheries (hatch_tag
) with tags detected by antennas (detections
) to understand which tags deployed at hatcheries are being detected later at downstream antennas (subloc
= 'ds'). You'll want to do the same procedure but for the wild tags in field
; so use a query to filter by tag_status
= 'tag' and clip_status
='unclipped' to produce a list of tag_id_long
s which represent wild tags from the field
table, and then join those tagids with detections
to again see which tagids are being detected at downstream antennas. You may calculate the difference in days between the tagging event (field.date
for the wild tags, hatch_tag.date_time_release
for the hatchery tags) and the detection event (detections.datetime
).
The final output of this would list tags, indicating the species, system/watershed, if the tag is wild or hatchery origin, the difference in days between tag and detection, etc.
For query 6;
SUM(CASE WHEN predator.predator = 'sea houl' THEN 1 ELSE 0 END) AS sea_houl_count,
This won't produce any results because 'sea houl' is not a valid value for the column predator.predator
. You can use SELECT DISTINCT predator FROM ods.predator
to see the range of possible values for that column; there are only two, heron or pinniped.
The original text for query 6;
For each system each year, look up what proportion of the total tags out were detected on seal haul outs or in heron rookeries (to be able to easily pull out predator data and calculate predation rates).
So first you'll want to calculate the total tags out for each system
and year
. This will require some queries like we did above, to get a count of total tags deployed grouped by system
and year
for the hatch_tag
, field
, and microtroll
tables. For field
& microtroll
, be sure to filter where tag_status
= 'tag'. You could then SUM()
the counts from each of those tables to give a total count of tags grouped by system
& year
. You would then know the total tags out (sum of counts for hatch_tag
, field
, & microtroll
).
Next, you'd want to get a count of tags detected in seal haul outs or heron rookeries. The query below can provide that summary;
-- This query provides a count of tags detected at either seal haul outs or heron rookeries
select count(*), predator, location_name
from predator
group by predator, location_name
You might then be able to do some general comparison to see out of all tags out in a given year, how many were detected at heron/pinniped locations. You could extract year from predator.date
to compare totals on a per year basis. For location comparisons, it will be trickier so you can omit that part for now.
@AReyH I responded to your query over slack. Mainly the results for hatchery seem inaccurate due to NULL
values for dates. I created a view / query to retrieve the tagging_date
from hatch_param
and link it with the hatch_tag
table. Also, may need to include a stage for 'estuary' which should include tagging events from field
where the method
='beach seine'.
@brahmwg Hello Brahm, thank you for your feedback. I revised my Query 1 based on your clarifications, and this is what I have now:
WITH hatchTagCount AS (
SELECT COUNT(*) AS count, system, species, date_time_release::DATE, tag_id_long, 'hatchery' AS origin
FROM hatch_tag
GROUP BY system, species, date_time_release, tag_id_long
), fieldCount AS (
SELECT COUNT(*) AS count, watershed, species, date, tag_id_long, 'field' AS origin
FROM field
WHERE tag_status = 'tag'
AND clip_status = 'unclip'
GROUP BY watershed, species, date, tag_id_long
), locationDetection AS (
SELECT *, detections.datetime::DATE AS date
FROM detections
LEFT JOIN location ON detections.location = location.location_code
WHERE location.subloc = 'ds'
), hatchTagDetectionList AS (
SELECT hatch_tag.tag_id_long, hatchTagCount.species, hatchTagCount.system, hatchTagCount.date_time_release, hatchTagCount.origin, locationDetection.date AS detection_date
FROM hatch_tag
LEFT JOIN locationDetection ON hatch_tag.tag_id_long = locationDetection.tag_id
LEFT JOIN hatchTagCount ON hatch_tag.tag_id_long = hatchTagCount.tag_id_long
WHERE hatchTagCount.date_time_release IS NOT NULL AND locationDetection.date IS NOT NULL
), fieldDetectionList AS (
SELECT field.tag_id_long, locationDetection.datetime, field.watershed, fieldCount.species, field.date, fieldCount.origin, locationDetection.date AS detection_date
FROM field
LEFT JOIN locationDetection ON field.tag_id_long = locationDetection.tagid
LEFT JOIN fieldCount ON field.tag_id_long = fieldCount.tag_id_long
WHERE field.tag_status = 'tag' AND field.clip_status = 'unclip' AND field.date IS NOT NULL
), fieldComplete AS (
SELECT fieldDetectionList.tag_id_long,
fieldDetectionList.watershed AS "system/watershed",
fieldDetectionList.species,
fieldDetectionList.origin,
ROUND(EXTRACT(EPOCH FROM (fieldDetectionList.datetime - fieldDetectionList.date)) / 86400, 3) AS diff_date
FROM fieldDetectionList
), hatchComplete AS (
SELECT hatchTagDetectionList.tag_id_long,
hatchTagDetectionList.system AS "system/watershed",
hatchTagDetectionList.species,
hatchTagDetectionList.origin,
ROUND(EXTRACT(EPOCH FROM (hatchTagDetectionList.detection_date - hatchTagDetectionList.date_time_release)) / 86400, 3) AS diff_date
FROM hatchTagDetectionList
)
https://marinescience.info/sqllab?savedQueryId=52
Can you let me know if I am going in the right direction? I think the fieldComplete
query is looking fine, but hatchTagDetectionList
does not return any rows because based on what I have, locationDetection.date
is all NULL values.
When I figure out what is wrong with the locationDetection.date
in hatchTagDetectionList
, I plan on concatenating fieldComplete
and hatcheryComplete
. I would really appreciate it if you can let me know if this is going in the direction that you are expecting me to!
Hi @jlee2843 , I replied to your latest question over slack. I think you are on the right track, and I created some views which should hopefully simplify your query.
Resolved and complete
This is for us to share the status of the queries we have so far.