baumlab / open-science-project

Ideas, data, R scripts, figures, and writing for Baum lab #OpenDerby
0 stars 0 forks source link

Dealing with differences in trade volume? #8

Closed JamieMcDevittIrwin closed 6 years ago

JamieMcDevittIrwin commented 6 years ago

Huge outliers in the volume of certain species --> decided to just use species occurrences instead of volume. So our response variable will be: If a species was exported or not (0,1) (binomial).

In addition, we decided to only look at the top 100 species that are exported by volume (just the full years 2008, 2009, 2010, no modelled data).

However, should we keep the dataframe like it is with species being exported by multiple countries in multiple years? Currently the top 100 csv is saved where only country is included, so not looking at species occurrences in years.

jpwrobinson commented 6 years ago

On the modelled/non-modelled stuff - modelling is only used to scale up trade volumes from monthly to yearly estimates. The actual species presence/absence can't be modelled. So this means we can use the non-modelled data, because partial years still contain information about which species countries exports.

It does also mean that partial years ('modelled') have lower sample effort than full years (2008,09,10), so we might miss the true number of species exported by each country, from 2000-2010.

To me, the options are:

1) aggregate across years and assume that countries export the common species every year, so we have reasonably detected the true number of countries that exports each species

2) Control for partial sampling with an offset/random covariate that identifies when species are sampled in full years (2008-10) or partial years (2000, 2004, 2005).

It would be good to check how species often appear in both partial + full datasets, per country. Would give an idea of the consistency of export records. Can't do this until Rhyne gets back with the weirdness of non-modelled records

JamieMcDevittIrwin commented 6 years ago

Ok! That sounds good.

Quick question, wouldn't we then want the top 100 species to be from both full and partial years? Looking at the script for making the top 100 it looks like only the full years were included "load_clean_trade_data.R"

jpwrobinson commented 6 years ago

You sure? Run that script down to line 50 (after aggregating top 100) and check how many years are in the 'trade' dataset. I see 6 years.

Line 22 is hashed out

JamieMcDevittIrwin commented 6 years ago

You're totally right... Thanks!

jpwrobinson commented 6 years ago

Any idea why the Genus, vulnerability, Aquarium, Length etc. columns came out as NA from rfishbase? Seems weird it'd pull diet info but not Genus.

Wasn't sure if I specified var names correctly on lines 63-70 on load_clean_trade_data.R.

Can you check? It still won't work on my end cos internet.

JamieMcDevittIrwin commented 6 years ago

Yup! Running still now, its taking forever.

JamieMcDevittIrwin commented 6 years ago

Yeah there is something wrong, the information is there but then didn't get matched into the trade table. But now both the diet and species info are not matching with the trade table. Ugh.

I'll keep working on this! The info is there, just having issues making them match.

jpwrobinson commented 6 years ago

Looking at new Rdata files and everything looks good. was it a matching problem?

jpwrobinson commented 6 years ago

oh no it just looks like fishbase has identified species with as totally different names. Getting all sorts of weird species appearing in spec and ecol dataframes (cod!).

jpwrobinson commented 6 years ago

I'm trying to run the validate names again but it's taking too long. I can't see why the ecol extraction works but the species one doesn't - it uses the same the fish names already validated from fishbase??

It looks like ecol has worked, but spec has completely different species. Any ideas?

JamieMcDevittIrwin commented 6 years ago

Ha. I'll look at this today! Yeah, it's super weird...

JamieMcDevittIrwin commented 6 years ago

Fixed!! :) Not sure how or why, but I started from scratch, reran everything, and now it is working. Even though I did this on Friday as well and it didn't work.... I'll double check again later to make sure it works again.

jpwrobinson commented 6 years ago

Nice one. Also made a top 100 for number exported countries.

JamieMcDevittIrwin commented 6 years ago

Great! I went back and double checked this code since I had so much trouble and I wanted to make sure it was reproducible, and the first time it looked weird and gadus was back in. But I ran it twice more and everything worked great. I did a spot check to compare from the csv created from the first time to the one I created in R today and it looks like it pulled the same species.

I just wanted to make sure "validate_names" wasn't pulling different species each time.