chews0n / glowing-waffle

SPE Calgary Data Science Mentorship program 2021
Apache License 2.0
1 stars 2 forks source link

Read in the data sets and create a unified data frame for each well in the list #15

Closed chews0n closed 3 years ago

chews0n commented 3 years ago

The data sets are spread across multiple files and have different correlating factors based on the file being read.

Figure out a way to cleanly combine these files making it easier to treat the data down the line and be able to filter out based on wells and formation (Montney).

chews0n commented 3 years ago

Since we're only considering the Montney, filter out everything not related to:

"Area Code","Area Desc","Area Eff Date","Area Term Date","Desgntd Field Flag" "6200","MONTNEY","20160625","","Y"

Eg. In BC Total Production.csv:

chews0n commented 3 years ago

looks like there are multiple areas for montney:

"9022","NORTHERN MONTNEY","20110921","","Y"

and multiple formations: "FORT","4990","BLUESKY-GETHING-MONTNEY" "FORT","4995","LOWER CHARLIE LAKE/MONTNEY" "FORT","4997","DOIG PHOSPHATE-MONTNEY" "FORT","5000","MONTNEY"

chews0n commented 3 years ago

@BDanyluik hi brendan, how's it going....

chews0n commented 3 years ago

If you check commit 09172e8 I have added a basic parser for the latitude and longitude of the surface location of the well so that you can get a rough idea of what should be done. Keep in mind a few things:

  1. We are adding to the data frame ScrapeOGC.feature_list by column merging on the other column well authorisation number (check the file's spelling of this as it does change throughout)
  2. You can perform the merge of the datasframes using the following line of code: self.feature_list = pd.merge(self.feature_list, filtered_df, how="left", on=['Well Authorization Number']) This will create the merged list and assumes that the WA number is already matching and the filtered_df is already filtered to only the columns that you need to add.
  3. Remember that all of the files downloaded from OGC have been read in as dataframes, to access them they are in the dictionary: ScrapeOGC.dataframes_dict Using the file name as the key (eg. 'wells.csv')
chews0n commented 3 years ago

take a look at commit 2fb6cbb there is now a function called read_well_data that will take a dictionary and will read in for each file, the list of headers that are given to it. This should make this easier, you will just have to list out the wells and headers you need in that dictionary and then do some calculations if your feature is a combination of multiple headers.