Closed riyaeliza123 closed 2 months ago
Dataset: Tables to use: field, microtroll, cleaning Some union/join from pit_tag
Model: Decision tree, DL model (more features)
Dashboard: Dash app
When these conditions are added to the database the o/p for species are the foll:
took some dummy data and created pipeline for a decision tree, plotted decision tree
the DL model using tensorflow is complete. The prediction has been done.
The prediction however is in decimal format.
array([[0.39008173, 0.0754076 ],
[0.5425115 , 0.04349737],
[0.22640109, 0.11200137],
[0.24807785, 0.11598497],
[0.28531218, 0.11743941]],
The format we want is
array([[1., 0.],
[1., 0.],
[1., 0.],
[1., 0.],
[1., 0.]])
The 2 methods I think we can use to achieve this (and my reasoning for/against):
I think we should go with 1 and if no labels are assigned, that means confidence is very low and we output the result as "none" / "cannot determine"
All files (notebook, data and Decision tree plot) have been pushed
Remaining:
tasks:
Reminder: Goal is to be able to accurately impute data. Interpretability is an added appreciated feature.
SELECT pit.tag_id_long, field.watershed, field.river, field.site, field.method, field.local, field.water_temp_start, field.species, field.fork_length_mm FROM pit_tag pit INNER JOIN field ON pit.tag_id_long = field.tag_id_long
https://marinescience.info/sqllab/?savedQueryId=51
Explore genetics_field also
species count
rbt 979
ct 515
cm 10
co 27187
so 4
bt 77
stl 1765
ck 31810
Questions to explore: (feature importance)
river
affect species?site
affect species?method
affect species?SELECT field.watershed,
field.river, field.site,
field.method, field.local,
field.water_temp_start,
field.fork_length_mm, field.species
FROM field
Details : https://github.com/brahmwg/Bottlenecks_MDS_Capstone/blob/main/deliverables/species_prediction_model.md (Data that can be used is detailed above)
Data objective: Use tagging location data (plus any other required data)
Output objective: Understand missing species data
Output:
Example output: For genetic stock assignment, which utilizes a Bayesian informed model, there is a 0.75 probability minimum value that needs to be met for inclusion in analysis.