A5-015 / flyway

A crowdedness forecasting application for NYUAD students, showcasing time-series forecasting using a recurrent neural network and modular model design
MIT License
2 stars 0 forks source link

wants data for running the code #1

Closed agl71 closed 3 years ago

agl71 commented 3 years ago

Dear sir:

The following data in your source codes can not be downloaded. Could you add them into .\src\dataset.

"final_network_sniffing_data.csv";"oui.csv","exclude_list.csv"

Best Regards;

Jackson

agl71 commented 3 years ago

and the "network_traffic_data.csv". Best Regards;

woswos commented 3 years ago

Hi,

Let me see if I still have them but they were quite large files as far as I remember.

agl71 commented 3 years ago

Thank you any way! By the way, there are some csv files in .\src\dataset. Could you tell me what is the meaning of 'cid', 'eth','mam','manid', which might be related to the user of MAC addresses. Your kindness will be great appreciated! Best Regards;

agl71 commented 3 years ago

Dear Sir:

I have a question with the paper of "Flyway: Predicting Foot Traffic in Open Spaces on Campus", which is in .\report\interim_report\Nishant_Barkin_Final_Project_Interim_Report.pdf. In Formula (2), the transfer probability of (i-o) is related the sum(from in to out), but if we did't know the ground truth , how could we know the SUM(from in to out) ? by the way, No answer was found after I searched the source code 'flyway_traffic_model.ipynb' .

Best Regards;

formula2

woswos commented 3 years ago

Sorry for the delay @agl71

1) I cannot share the whole network sniffing data since it contains a lot of private information but it follows the following format:

frame.number,frame.time,wlan.addr,wlan.ta,wlan.ra,wlan.sa,wlan.da,wlan.bssid,wlan.fc.type,wlan.fc.type_subtype,radiotap.channel.freq,radiotap.datarate,radiotap.dbm_antsignal "1","Nov 10, 2019 17:44:56.905180894 +04","ff:ff:ff:ff:ff:ff","34:97:f6:ac:9c:10","ff:ff:ff:ff:ff:ff","34:97:f6:ac:9c:10","ff:ff:ff:ff:ff:ff","34:97:f6:ac:9c:10","0","8","2457","1","-45"

You can sniff your own data using Wireshark in the format specified above. The final version file was the one that dropped the unused column, as far as I remember. You can see which columns were dropped in the Jupyter Notebook file (or possibly in the paper)

2) You can check this Wireshark Cheatsheet to understand what each column means in the sniffing data. 3) oui.csv is this file but in csv format. 4) exclude_list.csv is list of devices to exclude from sniffing, I don't have it and you might not need it at all. 5) I believe @niniack can answer the question related to the formula better. 6) I think we used 'cid', 'eth', 'mam', and 'manid' files to determine which devices were mobile phones and which ones were not. For example, Dell doesn't have phones, so the MAC addresses registered under Dell are not mobile phones.

agl71 commented 3 years ago

@woswos Thank you much! You are so kind! @niniack I think that the the ground truth should be known when we calculate transfer probability of (i-o) in formula 2 . But I did not find any parts of source code of it .

niniack commented 3 years ago

@agl71 Hey, thanks for checking Flyway out :) I looked through the report and found the equation you are referring to.

I am not sure if you have easy access to the text but if you read around the equation, this equation was actually something we referenced from another paper titled Occupancy prediction through markov basedfeedback recurrent neural network (m-frnn) al-gorithm with wifi probe technology by Wang et. al. We do not make the claim that we used this equation or any of their methodology directly!

In fact, on our constraints section on the same page, we say

The work done in Wang. et. al. was a source of inspiration for us to explore machine learning models used in time-series forecasting. However, the specifics of their approach did not align with our project as they had a greater number of validational resources at hand.

We do not use an approach that involves a transfer probability matrix!

The ground truth of our model is not actually the number of people because that would have involved us counting the number of people as part of data collection (which was not feasible for us). Our ground truth is the number of devices we pick up in our scanning!

Happy to answer more questions.

agl71 commented 3 years ago

@niniack Hay Thank you very much! Best Regards;

niniack commented 3 years ago

Glad we could help