Closed Vopaaz closed 4 years ago
Can you explain how to run the data migration python code in the README documentation? I am new to this project and a little bit confused.
Added in the latest commit.
I wonder what is the purpose of using Python to put data into the database? We can execute sql data files to do this.
I wonder what is the purpose of using Python to put data into the database? We can execute sql data files to do this.
Because the data is dynamically updated. We pull data from Baidu's API each day, reshape into the desired format and store it into the DB.
[feng@bcm point-to-point-migration]$ python integration.py
Traceback (most recent call last):
File "integration.py", line 101, in <module>
res = get_p2p_overall_dataframe()
File "integration.py", line 55, in get_p2p_overall_dataframe
history_curve = load_history(date, row.adcode)
File "integration.py", line 26, in load_history
update_history_if_outdated("in", city_id)
File "integration.py", line 18, in update_history_if_outdated
with open(path, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: './temp/move_in_history_110000.txt'
The first time I run the migration code. Please help.
Would you please try the latest version? @zhaofeng-shu33
By the way, if you want to run the code for test purposes, please comment those lines that actually dump data into the DB.
Would you please try the latest version? @zhaofeng-shu33
I will try again.
So why sleep every 1 second ? https://github.com/Glacier-Ice/data-sci-api/blob/fix-index-data/src-etl/point-to-point-migration/crawl.py#L22
After I comment the sleep code, I get a lot of txt files after running python integration.py
. So how can I put those txt data into database?
So why sleep every 1 second ? https://github.com/Glacier-Ice/data-sci-api/blob/fix-index-data/src-etl/point-to-point-migration/crawl.py#L22
If you request the API too frequently you will be blocked.
After I comment the sleep code, I get a lot of txt files after running
python integration.py
. So how can I put those txt data into database?
For main.py
I found you load the database config file from environment variable. But why you use json.loads
? It is actually loading from string.
For
main.py
I found you load the database config file from environment variable. But why you usejson.loads
? It is actually loading from string.
In the production environment, the config is indeed stored directly in the environment variable, instead of a path.
For
main.py
I found you load the database config file from environment variable. But why you usejson.loads
? It is actually loading from string.In the production environment, the config is indeed stored directly in the environment variable, instead of a path.
So I have to save the json string in this environment variable instead of a json file path?
For
main.py
I found you load the database config file from environment variable. But why you usejson.loads
? It is actually loading from string.In the production environment, the config is indeed stored directly in the environment variable, instead of a path.
So I have to save the json string in this environment variable instead of a json file path?
That's your choice. Personally I have a local branch using json.load
and set the CONFIG_PATH
as the config file path. After development, I cherry-pick all necessary commits to the remote-tracking branch and push.
That's cool
I find the script in main.py
creates two tables "migration_index" and "p2p_migration". What's the purpose of these two tables? Are they used in the API server?
Hi @zhaofeng-shu33 . Very good question. Apparently we need a guideline to introduce the tables. p2p_migration is for migration index from city to city(what p2p stands for), which starts from late January 2020, because the API only goes back for 30 days. while migration_index stores outflow and inflow index from one city, and could ideally date back to 2019 if fixed. I bet someone may have an interest in comparing the migration index from year to year to measure the drops in this year. http://qianxi.baidu.com/ may give you intuition.
Would any of you write a short manual to avoid the trouble of config setting again? Thanks a lot.
I have added some documentation.
Can you explain how to run the data migration python code in the README documentation? I am new to this project and a little bit confused.