Disfactory / SpotDiff

SpotDiff專案希望能讓鄉民比對 2016.5.20 前後衛星雲圖,去抓出農委會五萬筆資料中的疑似工廠位址上的建物是不是新增建物,可以集中火力去檢舉或是可以把台灣疑似工廠的地點掃一遍。
MIT License
6 stars 3 forks source link

Task 1: implement the Location table (table structure, operation functions, and testing functions) #9

Closed yalgorithm777 closed 2 years ago

yalgorithm777 commented 2 years ago

In this task, you need to implement the location table in our data model and the related functionality.

IMPORTANT: make sure that you read the Coding Standards section before writing code.

IMPORTANT: open separate branches and request code reviews to merge into the main branch when the subtasks are done.

There are three subtasks, described below:

Please reply to this issue if there are questions.

Sourbiebie commented 2 years ago

I have setup the development environment successfully and started working on this task.

Sourbiebie commented 2 years ago

I have a few questions about location fields in the data model. Please kindly comment. @yalgorithm777

  1. "Source (url root)" : Does it mean the url of a picture? If so, is text suggested to be the data type, or string of limited size?
  2. Inner bounding box: Is it a rectangle in pixels? If so, is it appropriate to leave 2 fields for it: inner_bounding_box_l and inner_bounding_box_w, which are both integers?
  3. zoom: Is it an integer or a float?
  4. The factory_id + year should be able to identify a column in location table. However, do we need an unique id to identify a location?
yalgorithm777 commented 2 years ago
  1. "Source (url root)" means the URL to the location on the map that we need to provide to the front-end. For data type, I think we can just use the string type with no character limit, for example:

    url = db.Column(db.String, nullable=False)
  2. The inner bounding box is the red box on this slide. We need the coordinates (latitude, longitude) of the top left corner and the bottom right corner. So we need 4 fields with the float data type.

  3. Good question about the zoom level data type @deeper747 @aelcenganda can you answer this question?

  4. I think it is better to have the ID column anyway as the primary key.

    id = db.Column(db.Integer, primary_key=True)
Sourbiebie commented 2 years ago

About 2, could @aelcenganda confirm? Because I asked if it is lat/lon last week, she said it's not and should be pixels as I remembered...

aelcenganda commented 2 years ago

Some supplements:

  1. "Source (url root)" means the URL to the location on the map that we need to provide to the front-end. For data type, I think we can just use the string type with no character limit, for example:

    url = db.Column(db.String, nullable=False)

    "Source (url root)" means the URL from which we retrieve the map view to embed in our webpage. (url root) means that at that time we considered only to log the domain name but the full url. We just need to know where the map view comes from in case we use other sources as well.

  2. The inner bounding box is the red box on this slide. We need the coordinates (latitude, longitude) of the top left corner and the bottom right corner. So we need 4 fields with the float data type.

Let me highlight the red "inner bounding box" on this screenshot 螢幕快照 2021-10-28 下午4 24 59

For frontend, the red "inner bounding box," with a cross at the center, will be set to constants in pixels displayed on any devices, so that we can control the area we want users to mark can be roughly the same.

For backend, it would be nice to log coordinates (latitude, longitude) of the top left corner and the bottom right corner. Nevertheless, in my knowledge, the frontend won't directly send back coordinates of the "inner bounding box" to backend as the corner coordinates can be calculated by the center coordinates with zoom level and inner bounding box range in pixels.

Probably a better way is to store frontend display range in pixels (int) and the corner coordinates? Or is this redundant?

  1. Good question about the zoom level data type @deeper747 @aelcenganda can you answer this question?

As for the zoom level data format, you should ask @LittleWhiteYA. He studied the API of SPOT, an academic satellite image website we are using for SpotDiff. @LittleWhiteYA , would you mind to point us the research work you have done on calling the SPOT API?

We had some discussion on how to embed map preview on the webpage in 2 meetups. I remember the zoom level of SPOT is an integer, but I don't know if the zoom level of other map sources is set to integer. We may retrieve data from other map sources such as OpenStreetMap, Google Earth, or Map Box.

Discussion on satellite image sources: https://github.com/Disfactory/SpotDiff/issues/1

Meetup discussion notes https://g0v.hackmd.io/_60KgrNgQmKIJQ8uBGpWlQ https://g0v.hackmd.io/TkItN6veTNGZaT-KVIvbSQ

  1. I think it is better to have the ID column anyway as the primary key.
    id = db.Column(db.Integer, primary_key=True)

    I agree.

aelcenganda commented 2 years ago

FYI, the coordinate we use for disfactory.tw is WGS84. I haven't checked what coordinate system SPOT is using right now (it may be different. As far as I remember, there are at least three popular coordinate system used in Taiwan GIS)

@yellowsoar , do you have any suggestions or reminders on using data from SPOT

Sourbiebie commented 2 years ago

For frontend, the red "inner bounding box," with a cross at the center, will be set to constants in pixels displayed on any devices, so that we can control the area we want users to mark can be roughly the same.

For backend, it would be nice to log coordinates (latitude, longitude) of the top left corner and the bottom right corner. Nevertheless, in my knowledge, the frontend won't directly send back coordinates of the "inner bounding box" to backend as the corner coordinates can be calculated by the center coordinates with zoom level and inner bounding box range in pixels.

Probably a better way is to store frontend display range in pixels (int) and the corner coordinates? Or is this redundant?

I see. I think the rectangle shall be lat/lon as yalgorithm777 described. The frontend takes charge of the translation between lat/lon and the physical display coordinates on devices or browser.

yalgorithm777 commented 2 years ago

thanks @aelcenganda for the clarifications

Sourbiebie commented 2 years ago

Two more questions.

  1. Should the "year"(Integer) be "created_at"(datetime)? deeper said it's the time the factory data created, and I guess it's the data we need to import to location table. I'm just not sure if the date/time needs be truncated.
  2. I can't create a branch for submitting the code for this task. Would @aelcenganda help?
aelcenganda commented 2 years ago

year and created_at are different things.

  1. According to the data model of SpotDiff, year is an integer attribute of location table to mark which year the satellite photo was taken from our geo sources. For example, year= 2017. It is which year the satellite image was taken by the third party. The year is very important to SpotDiff because we want to diff satellite images of the same location in different years to find which factories were built after May 20th, 2016, which should be demolished by the new law. The government data we have so far only show all the factories existing in 2019, so we can't flag which factories to report based on known factory locations.

BTW, @deeper747 , do we need to save which month of that year the satellite photo was taken? Is that information available from the source website?

  1. created_at is a datetime attribute in factory table that states when the factory ID was created. The location data attached to one FactoryID was created at the same time. That created_at logs when we scraped the data from government website or when a user added one new factory report with geo coordinates, depending on the source recorded in factory table.
deeper747 commented 2 years ago

BTW, @deeper747 , do we need to save which month of that year the satellite photo was taken? Is that information available from the source website?

Although it's good to have it, the "month" attribute is not available from the source.

dyfu95 commented 2 years ago

For frontend, the red "inner bounding box," with a cross at the center, will be set to constants in pixels displayed on any devices, so that we can control the area we want users to mark can be roughly the same.

For backend, it would be nice to log coordinates (latitude, longitude) of the top left corner and the bottom right corner. Nevertheless, in my knowledge, the frontend won't directly send back coordinates of the "inner bounding box" to backend as the corner coordinates can be calculated by the center coordinates with zoom level and inner bounding box range in pixels.

Probably a better way is to store frontend display range in pixels (int) and the corner coordinates? Or is this redundant?

@aelcenganda @Sourbiebie I found some formula to convert pixel into lat/long, maybe it is useable to log coordinates of the inner-bounding box:

  1. According to opensteetmap wiki, 1px can be converted into certain meter in different zoom-in level. For example, in zoom-in level 17 , 1px is equal to 1.1943 meters.
  2. According to converter, we can get new location's lat/long by adding certain kilometers. (we can see it formula by Viewing Page Source "檢視原始碼")
  3. As summary, if we know Latitude and longitude of the center, and Width and Height pixel of inner-bounding box, we can calculate the coordinates of the top left corner and the bottom right corner.
yalgorithm777 commented 2 years ago

question for @aelcenganda @deeper747 @DotSea

How are we going to determine the bounding box exactly? I wonder if it is possible to just let the front-end decide it (so that the backend does not need to return the bounding box).

Sourbiebie commented 2 years ago

I talked to deeper yesterday. In his opinion, the bounding box should be a fixed-length/width rectangle although may need translation to pixels when displaying by the front-end. We can confirm with the front-end pals tonight.

deeper747 commented 2 years ago

Data Flow

Backend give FactroryID to Frontend Frontend request lng lat with FactoryID from disfactory.tw Frontend request satellite picture from NCU SPOT Frontend give the difference of lng & lat between bounding box's corner and center back to DB when users submit the answers

yenchiah commented 2 years ago

We need to think more about the last step. If the back-end receives the answer with the bounding box from the front-end, then the bounding box information should be recorded in the Answer table, not the Location table. So we will need to revise the data model.

Sourbiebie commented 2 years ago

Besides, I just found the bounding box center can be moved from the factory location, so the boundary box position is not bounded to the factory location any more. How about just use upper-left and lower-right point to describe the bounding box originally planed, which is more straight-forwarded? Sorry for the inconvenience. @deeper747 @aelcenganda

Sourbiebie commented 2 years ago

Sorry I closed it unexpectedly.

Sourbiebie commented 2 years ago

We need to think more about the last step. If the back-end receives the answer with the bounding box from the front-end, then the bounding box information should be recorded in the Answer table, not the Location table. So we will need to revise the data model.

If this is confirmed, I'll update the code of location table. I have implemented the bounding box in the answer table, but waiting for the decision to pull the code of task 1 and task 3.

yalgorithm777 commented 2 years ago

We need to think more about the last step. If the back-end receives the answer with the bounding box from the front-end, then the bounding box information should be recorded in the Answer table, not the Location table. So we will need to revise the data model.

If this is confirmed, I'll update the code of location table. I have implemented the bounding box in the answer table, but waiting for the decision to pull the code of task 1 and task 3.

Let's chat about this on this week's meeting.

Sourbiebie commented 2 years ago

Done, wait for code review.