ESIPFed / gsoc

Project ideas and mentor guidance for ESIP members to participate in Google Summer of Code.
Apache License 2.0
34 stars 16 forks source link

CNN Classifier for High-Resolution Daytime Satellite Images Linked to the Nighttime Infrared Detections #7

Closed zhizhin closed 6 years ago

zhizhin commented 6 years ago

Idea

VIIRS Nightfire http://www.mdpi.com/2072-4292/5/9/4423 provides anchors to possible gas flare / industrial site / fire-explosion events with ~100m location error, but we can’t see them in high resolution at night. It will be interesting to apply deep learning image classifier (Regions with Convolutional Neural Networks, R-CNN) to small subsets from Google Maps high resolution satellite images (or Digital Globe archive) from day time to identify the sources of the infrared signal at night.

We can distinguish between oil fields / refineries / steel mills / biomass fires by eye, but the two NOAA satellites, Suomi NPP and JPSS-1, provide >150,000 infrared detections every night, and it is not feasible to attribute them manually.

Typical images of the images from Google Maps anchored with the Nightfire detections from January 24, 2018:

Gas flares in North Dakota image

Refinery in Texas image

Skills Needed

Tensorflow, possibly Google Cloud Vision API, Python, QGIS

Mentor

Mikhail Zhizhin, CU Boulder / NOAA

stbnps commented 6 years ago

Hi!

Cool project overall, but I have some questions.

Is this project oriented to the research centered (find the best model, and build a simple set of scripts), or does it require a heavy effort on additional functionality (nice GUI, itegration with other software etc)?

Do you already have a baseline to compare models? If not, does the project involve research on existing models to use them as baseline?

Why are you focusing on detection (with RCNN)? (Semantic segmentation may also solve the problem)

I do have access to a k40 GPU, however: Do you provide infrastructure to train the models or should we use our own?

What about nonfunctional requirements? Does it require to be trained/run on limited hardware resources, software libraries, etc...

Thanks!

zhizhin commented 6 years ago

Hi,

The project is oriented on research only. All these GUI etc. may happen an excuse to avoid the main problem: correct classification.

We have all the input, namely candidate locations (no need to improve) and a way to get high-res imagery for each of them (this can be improved).

We need labeling of the candidates with some certainly numbers, say a given location is 95% gas flare and 5% refinery.

The baseline is our expert assigned labels for > 10,000 flares

The student can pick a deep learning platform (RCNN or whatever) upfront, provided the choice is supported with some arguments, theoretical or test runs.

We have no GPU infrastructure in house, but I can apply to use PRP (Kubeflow ?) if needed.

Regards

Misha

On Feb 12, 2018, at 1:49 PM, Esteban notifications@github.com wrote:

Hi!

Cool project overall, but I have some questions.

Is this project oriented to the research centered (find the best model, and build a simple set of scripts), or does it require a heavy effort on additional functionality (nice GUI, itegration with other software etc)?

Do you already have a baseline to compare models? If not, does the project involve research on existing models to use them as baseline?

Why are you focusing on detection (with RCNN)? (Semantic segmentation may also solve the problem)

I do have access to a k40 GPU, however: Do you provide infrastructure to train the models or should we use our own?

What about nonfunctional requirements? Does it require to be trained/run on limited hardware resources, software libraries, etc...

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-365040944, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2EC6kSRO2j8pFdQ8bIujT_IZvjgoLks5tUJXkgaJpZM4RsFAg.

utkarshrai commented 6 years ago

Hello! I'm interested in applying for this project but I have some questions.

  1. As you specify this is a research project, with "Correct classification" as a goal. Would that mean that we'll be spending most of the time in the analysis and feature compilation?
  2. What will be the procedure for judging the applications? What before applying?
  3. Is there an existing repo with some beginner code or somewhere this data is hosted where applicants can play around with the pictures without downloading GBs of data. (Kaggle maybe?) Any help in getting more familiar with the data is appreciated.
Konsang commented 6 years ago

Hi @zhizhin

Your project looks really interesting. I'm interested in applying for the same. But I'm not sure I clearly understand the problem. As @stbnps mentioned earlier, 'Why are you focusing on detection (with RCNN)? (Semantic segmentation may also solve the problem)'. Are there any warm-up exercises upon which the eligibile candidates will be shortlisted?

Thanks !

zhizhin commented 6 years ago

To answer the questions:

  1. We have manually processed list of 10,000+ persistent high temperature nighttime combustion sources (flares, volcanoes, etc.) for years 2012-2016. The NN classifier will be compared with the expert labeling for the success rate. If the NN output will conflict with our expertise, it will not be useful for classification of 2017 data.

  2. For the warm-up I will prepare a smaller data sample. It will make like easier for both of us. However, I am not interested in “try and go away” approach.

Misha

On Feb 19, 2018, at 5:08 AM, Kaustav Tamuly notifications@github.com wrote:

Hi @zhizhin https://github.com/zhizhin Your project looks really interesting. I'm interested in applying for the same. But I'm not sure I clearly understand the problem. As @stbnps https://github.com/stbnps mentioned earlier, 'Why are you focusing on detection (with RCNN)? (Semantic segmentation may also solve the problem)'. Are there any warm-up exercises upon which the eligibile candidates will be shortlisted?

Thanks !

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-366672582, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2ELIe77ngOCP4EWN0e-HtuoWG-neCks5tWWQpgaJpZM4RsFAg.

Konsang commented 6 years ago

That sounds perfect! Any word regarding the decision timeline and when the mini dataset will be released?

stbnps commented 6 years ago

I've re-read the problem statement, and I have another question:

I understand that the pipeline goes like this: You first have a location candidate for a fire (night time image), then you analyze a (large) ROI around that location and classify it as oil fields / refineries / steel mills / biomass (day time image).

You have a location for the event, so you already have part of the information, and you may now want to refine it and calculate some boundaries. Using selective search (R-CNN) may compromise the classification accuracy due to selection of suboptimal region candidates and warping them before classification. Using boundary regression methods (YOLO, Faster R-CNN) may be hard since we do not have ground truth information on boundaries and we cannot compute the IoU (correct me if I'm wrong), also, segmentation networks will be hard to train too.

What trade off do you want to make between classification accuracy, location (distance to the anchor) accuracy, and bounding box accuracy (region boundaries), and what information do we have besides the day time image and the anchor location (eg: bounding boxes, satellite infrared images...)?

Thanks!

zhizhin commented 6 years ago

Esteban, to answer your question:

Accuracy of classification is the top priority. Exact location not needed, but may be useful as a side product.

The problem is that the IR anchor can point to an extended object with multiple flares or a large industrial site with a furnice inside the walls. There is no way to decide which flare or building was contributing to the heat source. We need to classify the objects by type. Their “exact” location will come from IR detection averaging

Misha

On Feb 22, 2018, at 7:08 AM, Esteban notifications@github.com wrote:

I've re-read the problem statement, and I have another question:

I understand that the pipeline goes like this: You first have a location candidate for a fire (night time image), then you analyze a (large) ROI around that location and classify it as oil fields / refineries / steel mills / biomass (day time image).

You have a location for the event, so you already have part of the information, and you may now want to refine it and calculate some boundaries. Using selective search (R-CNN) may compromise the classification accuracy due to selection of suboptimal region candidates and warping them before classification. Using boundary regression methods (YOLO, Faster R-CNN) may be hard since we do not have ground truth information on boundaries and we cannot compute the IoU (correct me if I'm wrong), also, segmentation networks will be hard to train too.

What trade off do you want to make between classification accuracy, location (distance to the anchor) accuracy, and bounding box accuracy (region boundaries), and what information do we have besides the day time image and the anchor location (eg: bounding boxes, satellite infrared images...)?

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-367691512, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2ELO0qd5vESM8ZWFizNIWfoUwiv9wks5tXXT4gaJpZM4RsFAg.

Konsang commented 6 years ago

So there is an intermediate step of data annotation because, without it, it'll be hard to classify any RoI. We can use an use an existing dataset like Spacenet which has annotated regions to propose specific RoI but to classify them between the two NOAA satellites won't be possible without manually labelling the Regions with the help of softwares like QGIS.

zhizhin commented 6 years ago

Kaustav, I am afraid you are out of scope here.

The classifier has nothing to do with the JPSS-1 or SNPP. They are LOW resolution imagers. The real input for classifier should be 0.5 - 1 m resolution BW or color images from WordView 3 or whatever. Even Landsat 8 / Senteniel * imagery is too coarse for our task.

Misha

On Feb 22, 2018, at 1:41 PM, Kaustav Tamuly notifications@github.com wrote:

So there is an intermediate step of data annotation because, without it, it'll be hard to classify any RoI. We can use an use an existing dataset like Spacenet which has annotated regions to propose specific RoI but to classify them between the two NOAA satellites, Suomi NPP and JPSS-1 won't be possible without manually labelling the Regions with the help of softwares like QGIS.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-367815102, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2ELyIqPMRg5fYD5NMueLtuw3xj-Deks5tXdDegaJpZM4RsFAg.

Konsang commented 6 years ago

I'm sorry I've updated my comment.

Thanks!

zhizhin commented 6 years ago

To put it simple :

From SNPP / JPSS-1 we know that something is condstantly hot at night inside 1 square kilometer pixel

Using 1 m resolution daytime images (may be for several dates) of these square kilometer, can we tell with certainty is it an oil well, a refinery or steel mill ?

Misha

On Feb 22, 2018, at 1:41 PM, Kaustav Tamuly notifications@github.com wrote:

So there is an intermediate step of data annotation because, without it, it'll be hard to classify any RoI. We can use an use an existing dataset like Spacenet which has annotated regions to propose specific RoI but to classify them between the two NOAA satellites, Suomi NPP and JPSS-1 won't be possible without manually labelling the Regions with the help of softwares like QGIS.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-367815102, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2ELyIqPMRg5fYD5NMueLtuw3xj-Deks5tXdDegaJpZM4RsFAg.

Konsang commented 6 years ago

Certainly. I was just a bit confused because the training and testing had varying pixel size.

Thanks for the clarification!

fenilsuchak commented 6 years ago

Hi @zhizhin , I am interested in applying for this project. I went through the description and the discussions , what still confuses me is that you mentioned "Accuracy is the top priority and exact location isn't so necessary" . But given that there can be multiple objects in the image,bounding boxes would be important I suppose , So my question is will the IoU metric carry any importance at all?

And also a sample dataset would be very useful. Thanks.

Fenil

danijak commented 6 years ago

Hi, I am interested in this project.

leandroleal commented 6 years ago

Hello Dr. Zhizhin

This problem is very interesting and I'm excited to applying for it. I have experience with deep learning and remote sensing, specifically a semantic classification (using U-net) of Landsat data (for pastureland mapping), as part of my PhD (https://github.com/lapig-ufg/dl4landsat).

Basically the data acquired by Suomi NPP and JPSS-1 sattelites will indicate constantly hot regions (one square kilometer pixel). Inside those regions, a deep-learning solution will identify what facility is responsible for the nighttime infrared detection, using high-res imagery (WordView 3 or whatever). Right ? I only have some questions:

Thanks very much,

Bests, Leandro Leal Parente

zhizhin commented 6 years ago

Hello Leonardo,

Thank you for the interest to our problem.

1 The final solution DOES NOT need to process SNPP/JPSS-1 data. We have solved that part and it is running operationally for 5 years already.

2 We have manually attributed flares from the past years. Every year we get a new set and it is interesting to automate the attribution.

3 The primary interest of my group is gas flares in oil fields, refineries and liquid natural gas terminals. But the total set of IR nighttime sources is reacher, there are steel mills, chemical plants, volcanoes etc. The NARROW problem is to select from the variety the gas flares / refineries / LNGs with high probability (no missed targets, no added errors)

4 The main output is the IR source type (class). Geolocation may be “enhanced”, but we have it already with 300m certainty which is sufficient for our applications

Good luck

Misha

On Mar 10, 2018, at 12:51, Leandro Leal Parente notifications@github.com wrote:

Hello Dr. Zhizhin

This problem is very interesting and I'm excited to applying for it. I have experience with deep learning and remote sensing, specifically a semantic classification (using U-net) of Landsat data (for pastureland mapping), as part of my PhD (https://github.com/lapig-ufg/dl4landsat https://github.com/lapig-ufg/dl4landsat).

Basically the data acquired by Suomi NPP and JPSS-1 sattelites will indicate constantly hot regions (one square kilometer pixel). Inside those regions, a deep-learning solution will identify what facility is responsible for the nighttime infrared detection, using high-res imagery (WordView 3 or whatever). Right ? I only have some questions:

The final solution need processing the NPP/JPSS-1 data, identify the constantly hot regions and retrieve the high-res imagery for the identified region (end-to-end solution) ? or it will focus only in the classification approach ?

Part of 10,000+ persistent high temperature nighttime combustion sources can be used to train/calibrate the NN algorithm [based in a classical machine learning data stratification - train (70%), test (15%) and validation (15%)] ?

What are the facility classes ? Oil fields, refineries, steel mills and biomass fires ?

The main output of solution is what facility is responsible for the detection ? The geolocation of this facility is desirable but is less relevant, right ?

Thanks very much,

Bests, Leandro Leal Parente

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-372060413, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2ELx4tNLRlSXoItA3Hs5U5LI8da6eks5tdC6_gaJpZM4RsFAg.

leandroleal commented 6 years ago

Hi Dr. Zhizhin,

I'm elaborating my proposal and I got some other questions:

  1. There will be more than one IR nighttime sources inside the one constantly hot region ? For instance, multiples gas flares in oil fields and refineries can be located inside the one square kilometer pixel ?
  2. Did you found any correlation between the heat sources and the SNPP/JPSS-1 pixels values ?
  3. How you actually decide, manually, what facility classes is responsable for the IR nighttime detection ?

Thanks very much,

Bests, Leandro Leal Parente

zhizhin commented 6 years ago

Dear Leandro,

To answer your questions:

There will be more than one IR nighttime sources inside the one constantly hot region ? For instance, multiples gas flares in oil fields and refineries can be located inside the one square kilometer pixel ? We need to classify the sources. If there will be more than one flare in that pixel, set them all to flares. In the mixed case set them all to the most probable class. Did you found any correlation between the heat sources and the SNPP/JPSS-1 pixels values ? Here I do note understand exactly what you mean. We use SNPP multispectral images to detect the heat sources. Some of them (large fires, volcano eruptions, etc.) can take more than one pixel. But these are transient. Our prime interest is in persistent heat sources of subpixel size. So, from the SNPP / JPSS-1 we already know that there is something hot under that pixel. If several images (from different orbits) confirm the same location, we take average pixel coordinates and temperatures derived from individual images. This improves the heat source location precision to 50 m and the temperature estimates to +/- 50 deg K How you actually decide, manually, what facility classes is responsable for the IR nighttime detection ?

We use Google Map/Earth and terraserver.com high resolution satellite images for that location. If you see tanks, it is an oil refinery. If you see stocks of coal and ore, it is a steel mill. And so on

All the best

Misha

On Mar 16, 2018, at 10:28 AM, Leandro Leal Parente notifications@github.com wrote:

Hi Dr. Zhizhin,

I'm elaborating my proposal and I got some other questions:

There will be more than one IR nighttime sources inside the one constantly hot region ? For instance, multiples gas flares in oil fields and refineries can be located inside the one square kilometer pixel ? Did you found any correlation between the heat sources and the SNPP/JPSS-1 pixels values ? How you actually decide, manually, what facility classes is responsable for the IR nighttime detection ? Thanks very much,

Bests, Leandro Leal Parente

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-373768730, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2EFF-6V0vlQdIuvzQoxGQjoubJTswks5te-gpgaJpZM4RsFAg.

fenilsuchak commented 6 years ago

Hello @zhizhin! I have submitted a draft proposal for the GSOC'18 I request you to please review it and comment any feedback on the draft proposal (google doc) which I have shared via the GSOC page. I hope you have received the link.

Thank you.

zhizhin commented 6 years ago

Hello fenx,

I am currently on travel, will be reading the CSOC docs later this week

Misha

On Mar 18, 2018, at 7:14 AM, fenx notifications@github.com wrote:

Hello @zhizhin https://github.com/zhizhin! I have submitted a draft proposal for the GSOC'18 I request you to please review it and comment any feedback on the draft proposal (google doc) which I have shared via the GSOC page. I hope you have received the link.

Thank you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-373977676, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2EFTJ48KeoWUdayeamd8GBkD4Tv4zks5tfgligaJpZM4RsFAg.

leandroleal commented 6 years ago

Hi Dr. @zhizhin,

I submitted my draft proposal, according with the ESIPFed Application Template ( https://github.com/ESIPFed/GSoC/wiki/Application-Template). If possible, can you review it ? It will be good if I can clarify some proposal aspects before of my final submission.

Thanks very much.

Bests, Leandro Leal Parente

fenilsuchak commented 6 years ago

This project idea wasn't listed in the GSOC project ideas list. Is there an error?

zhizhin commented 6 years ago

No errors

The idea got the most number of applications in the GSOC but was not approved by the GSOC manager

Misha

On Apr 23, 2018, at 10:53 AM, GeneX notifications@github.com wrote:

This project idea wasn't listed in the GSOC project ideas list. Is there an error?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-383645738, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2ENNx7DkEo95Vdprqd-LkCht6KS0Qks5trgcEgaJpZM4RsFAg.

fenilsuchak commented 6 years ago

@zhizhin Oh, thats sad. Any particular reason for not approving?

zhizhin commented 6 years ago

GSOC has had not enough project slots to fit this idea

Misha

On Apr 23, 2018, at 11:14 AM, GeneX notifications@github.com wrote:

@zhizhin https://github.com/zhizhin Oh, thats sad. Any particular reason for not approving?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/7#issuecomment-383652144, or mute the thread https://github.com/notifications/unsubscribe-auth/AAr2EApmNifGW86N1qY7DOVNXyi1RHHEks5trgv0gaJpZM4RsFAg.

esip-lab commented 5 years ago

@zhizhin would like to update and re-open this issue to try to get a student for GSoC 2019?