DataDrivenEcologicalSynthesis / VirtualBiogeography

Analyzing the biogeographical accuracy of 🐠Finding Nemo🐡
MIT License
2 stars 0 forks source link

Presence/absence vs. abundance counts #45

Closed phylojenie closed 4 years ago

phylojenie commented 4 years ago

Hi all,

Not sure if you've been following my conversation with @superornitho but I'd like to run something by everyone before we commit to it.

Should we just be recording presence/absence of a given species at our 8 different locations, rather than counting abundance of each species? The reasons for this are two-fold:

  1. The numbers we observe in the film won't necessarily scale up to reflect real-world abundance ratios, so counting each individual in a species won't be particularly useful

  2. Since we have decided to use range maps rather than individual occurrence data, we don't have any real-world abundance data to compare our observed numbers to. Rather, we're looking to track where each species occurs in the movie and what species they co-occur with, which can be done with simple presence/absence counts.

Let me know what you think. And thanks Etienne for pointing this out.

indigotran commented 4 years ago

Yes, as we talked about in the group discussion today, we will go with just presence/absence rather than counting the number of each individual. (This comment is more for documentation purpose for future references). And how do you want to divide up the screenshots since there are 170 of them? Roughly 57 each? @phylojenie @superornitho

phylojenie commented 4 years ago

Great, I believe I mentioned it in an earlier thread but I think we should record species presence/absence with a 1 for present, 0 for absent in the corresponding location column.

That sounds okay to me, is there a way to split up the coral reef scenes among the three of us rather than giving them all to one person? Some of them are quite overwhelming to work through.

indigotran commented 4 years ago

So here's a breakdown of screenshots by movie time and location:

Time # of screenshots Location identifier Species density
00:00 > 15:08 82 GBR Dense
16:33 > 25:26 23 SPO Dense
26:48 > 29:50 7 DO Moderate
34:13 1 T Few
40:37 > 45:53 6 SPO Few
51:47 > 56:33 7 EA Few
56:38 > 57:05 7 SPO Moderate
57:08 > 01:26:35 23 SH + DO + FG Moderate
01:31:00 > end credit 14 GBR + credit Dense

How about we each do a combination of (dense-moderate-few), i.e: 40 dense - 12 moderate - 5 few And the layout of the data record sheet would be like this: The list you two compiled would be left alone as the original reference list, then each location would be split into separated files (I represent them here in different tabs but that won't work for .csv and we can create different files like SpeciesPresenceCount_DO, SpeciesPresenceCount_GBR, etc.), and hopefully that way the merge conflict would be reduced since we will be working on separate files.

example_layout

Let me know what you think. Maybe my mind is rambling and overcomplicating things but I just want to tease the organization out as much as possible.

phylojenie commented 4 years ago

@indigotran I'm open to reworking how we organize our observations but for now I feel like the observation sheet we have going is fine? I might be missing something that would cause difficulties down the road however.

I don't see the benefit of tracking the screenshots in the same spreadsheet as the observation data, as I think it just adds a lot of unneccessary data to the sheet. Just recording presence/absence of each species at each location should be enough, no? Then we have the separate spreadsheet for tracking if a screenshot has been analyzed and if it contains unidentifiable species.

Again, I may be overlooking an issue of just using one spreadsheet to track, let me know what you think.

indigotran commented 4 years ago

@phylojenie Honestly, I think I was way overthinking it with the approach. What we have now should already work. We will deal with the merge conflict whenever it arises.

phylojenie commented 4 years ago

@indigotran Sounds good. Hopefully merge conflicts can be avoided if we use the second spreadsheet to track which screenshots we've worked on already.

As @superornitho mentioned in the other thread, many of the dense screenshots are already taken care of (Etienne you're amazing, thank you), so we can figure out how to split up the remaining shots?

superornitho commented 4 years ago

Thanks @phylojenie, so much praise for me today, I am not used to it ;) Feels good though :)

Le 29 mai 2020 à 11:24, phylojenie notifications@github.com a écrit :

@indigotran https://github.com/indigotran Sounds good. Hopefully merge conflicts can be avoided if we use the second spreadsheet to track which screenshots we've worked on already.

As @superornitho https://github.com/superornitho mentioned in the other thread, many of the dense screenshots are already taken care of (Etienne you're amazing, thank you), so we can figure out how to split up the remaining shots?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataDrivenEcologicalSynthesis/VirtualBiogeography/issues/45#issuecomment-636033875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJL7PT7VQM57SQOISBPWSTRT7HSRANCNFSM4NLONUGQ.

indigotran commented 4 years ago

Oh I know why I came up with such a complicated system. I want to track it screenshot by screenshot so that we have the timeline accumulation of species like the graph in the presentation today. Let's say we go with the system we are using to record right now, how can we add that layer of data? Or should we just go with presence/absence of unidentified and make some sort of bar graphs for each location?

superornitho commented 4 years ago

The way I am doing it now is to record the name of the screenshot for every first appearance of a species. If we extract the names of these screenshots and assign them to a time-stamp, we can do a species accumulation curve this way. However, I want to be sure that recording the unidentifiable species will be useful, because going through them is going to be time consuming.

Le 29 mai 2020 à 11:30, Trang Indigo Tran notifications@github.com a écrit :

Oh I know why I came up with such a complicated system. I want to track it screenshot by screenshot so that we have the timeline accumulation of species like the graph in the presentation today. Let's say we go with the system we are using to record right now, how can we add that layer of data? Or should we just go with presence/absence of unidentified and make some sort of bar graphs for each location?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataDrivenEcologicalSynthesis/VirtualBiogeography/issues/45#issuecomment-636036969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJL7PULSS2QJBJ5WCJGQFLRT7IJZANCNFSM4NLONUGQ.

indigotran commented 4 years ago

@superornitho good work! And you should get used to the praise now because I will keep giving credit where credit is due! I will add the reviewer column to the file screenshot_analysis_status.csv so we know not to redo it, then add my name to the ones I will do.

indigotran commented 4 years ago

The way I am doing it now is to record the name of the screenshot for every first appearance of a species. If we extract the names of these screenshots and assign them to a time-stamp, we can do a species accumulation curve this way. However, I want to be sure that recording the unidentifiable species will be useful, because going through them is going to be time consuming. Le 29 mai 2020 à 11:30, Trang Indigo Tran @.***> a écrit : Oh I know why I came up with such a complicated system. I want to track it screenshot by screenshot so that we have the timeline accumulation of species like the graph in the presentation today. Let's say we go with the system we are using to record right now, how can we add that layer of data? Or should we just go with presence/absence of unidentified and make some sort of bar graphs for each location? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJL7PULSS2QJBJ5WCJGQFLRT7IJZANCNFSM4NLONUGQ.

I think it may be useful to just get a number of unidentifiable. Maybe when you put the screenshot name in the last column in finding-nemo_full-species-list.csv, just add a bracket with the number of unidentifiable? For example, SS_00-01-37_GBR.png[ 2-unsure]. I'll go through the collection and make the graph after! Do you think this would work?

phylojenie commented 4 years ago

Okay, I like the idea of tracking first appearance of a species by screenshot and making the accumulation curve that way, good thinking.

As for tracking unidentifiable species, I was thinking we would just indicate which screenshots have unidentifiable species in the spreadsheet that tracks which ones have been analysed, then if we had time we could go back and give them a closer look.

indigotran commented 4 years ago

@phylojenie I agree that we should focus on gathering data first, and then we can worry about analyzing them next week! I will make some changes in screenshot_analysis_status.csv now and merge it to master in the next hour so if you're planning on making changes, wait for my pull request first, is that ok? @superornitho make sure that you are pushing your changes to the repo of what you did. I haven't seen any changes to the finding-nemo_full-species-list.csv yet, so I'm going to wait until you merge your changes to avoid any merge conflict.

superornitho commented 4 years ago

Cool, I can do it that way! Do I record as unidentifiable species only the fish and other conspicuous animals? If we include the anemones and corals, pretty much every screenshot is going to contain unidentified species in the reef area.

Le 29 mai 2020 à 11:38, phylojenie notifications@github.com a écrit :

Okay, I like the idea of tracking first appearance of a species by screenshot and making the accumulation curve that way, good thinking.

As for tracking unidentifiable species, I was thinking we would just indicate which screenshots have unidentifiable species in the spreadsheet that tracks which ones have been analysed, then if we had time we could go back and give them a closer look.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataDrivenEcologicalSynthesis/VirtualBiogeography/issues/45#issuecomment-636041040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJL7PS2HXH4EEASMHCHAJ3RT7JIPANCNFSM4NLONUGQ.

indigotran commented 4 years ago

Hmmm if you can count them then sure. Maybe [2-unidCoral;3-unidAnimal]? Or if you want to be more specific, there is a column in screenshot_analysis_status.csv for unidentifiable species. I'll leave this up to you and we will follow what you did @superornitho Also make sure to avoid using comma ,, since we're saving the file in .csv aka comma-separated values.

Cool, I can do it that way! Do I record as unidentifiable species only the fish and other conspicuous animals? If we include the anemones and corals, pretty much every screenshot is going to contain unidentified species in the reef area. Le 29 mai 2020 à 11:38, phylojenie @.***> a écrit : Okay, I like the idea of tracking first appearance of a species by screenshot and making the accumulation curve that way, good thinking. As for tracking unidentifiable species, I was thinking we would just indicate which screenshots have unidentifiable species in the spreadsheet that tracks which ones have been analysed, then if we had time we could go back and give them a closer look. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJL7PS2HXH4EEASMHCHAJ3RT7JIPANCNFSM4NLONUGQ.

indigotran commented 4 years ago

The edit is live on the pull request #53. Please go ahead and merge the changes once you have seen it @phylojenie @superornitho

phylojenie commented 4 years ago

@indigotran Could you remind me once again how you are tracking unknown species in the spreadsheet? I am seeing the screenshot name for the first appearance of a given species, then the screenshots listed afterwards are potential appearances of that species?

phylojenie commented 4 years ago

@indigotran Or, are you just listing all unknown species screenshots at the end in the "unknown" row?

indigotran commented 4 years ago

@phylojenie If the unID species is in the same shot with another ID species, then put it in the cell of that ID species with SS_00-00-00.PNG[number_of_unid-unid]. If the unID is not in the same shot with anything else, put it in the cell of the last row. I've been tracking the screenshots with only unID species here. I also made note of this system in the README file for this folder.

phylojenie commented 4 years ago

Okay thank you for the clarification! Makes sense.

phylojenie commented 4 years ago

@indigotran One more question; if a screenshot shows multiple unID fish with one or two ID fish, how should we track that? Do we put the name of the screenshot beside one of the ID species names along with the first appearance of that species? Or do we place it with the unidentifiable screenshots.

e.g. SS_00-11-49_GBR.PNG contains Chelmon rostratus plus several unidentifiable species. However Chelmon rostratus already appeared in the movie so it has a screenshot listed for its first appearance. Should I add in this second screenshot with [4-unID] to the Chelmon rostratus row?

indigotran commented 4 years ago

@phylojenie Yes that is what I was doing. I added the screenshot name to the same ID species, even though it already has another establishing screenshot. So yes, add the second screenshot with [4-unID] to the Chelmon rostratus row