akalikadien / shell-ai-hackaton-2023

Repo for the agricultural waste challenge of Shell.ai 2023 https://www.shell.com/energy-and-innovation/digitalisation/digital-and-ai-competitions/shell-ai-hackathon-for-sustainable-and-affordable-energy.html
0 stars 0 forks source link

Please check distance matrix #6

Closed vsinha027 closed 1 year ago

vsinha027 commented 1 year ago

https://github.com/akalikadien/shell-ai-hackaton-2023/blob/96f4d4721210dd5bfbfc90faf08dc17a9c5ca04f/place_storage_and_refineries_optimize_flow.py#L97 The distance matrix that should be created would use the distance between harvest site i and site j where depot is placed. I don't think this code does that. Here is an ugly code that does it. It first gets the indices of the sites that correspond to the depots. Then using two nested for loops it creates a dist_sites_to_depots using the distance matrix provided by Shell.

for i in range(self.num_depots):
          self.depot_indices.append(self.biomass_df[(self.biomass_df['Latitude'] == self.depot_cluster_centers[i,0]) & (self.biomass_df['Longitude'] == self.depot_cluster_centers[i,1])]['Index'].values)

        # Create distance matrices for harvesting sites to depots and depots to refineries
 for i in range(0,2418):
          for j in range(0, self.num_depots):
               self.dist_sites_to_depots[i,j] = distance_matrix[str(i)][self.depot_indices[j]]
vsinha027 commented 1 year ago

Same holds true for biorefineries and depots. Please be aware that if you read the distance matrix csv file as a pandas dataframe the first column that has the indices is unnames, and rest of the columns which have distances are named after the indices i.e. 0,1,2 etc. I used following ugly way to get around this issue. You can do the same and we can pretend that it never happened. for i in range(0,self.num_depots): for j in range(0,self.num_biorefineries): s1 = re.sub("[\[\]]","",str(self.depot_indices[i])) self.dist_depots_to_refineries[i,j] = distance_matrix[s1][self.refinery_indices[j]]

akalikadien commented 1 year ago

You can also just drop the unnamed column right? That's what I did now. Anyway I thought of a more efficient approach to your code. What about this. Create a dictionary that maps the depot cluster center coordinates to their corresponding indices

depot_centers_indices = {}
    for i in range(self.num_depots):
        depot_center = tuple(self.depot_cluster_centers[i, :2])  # Extract Latitude and Longitude
        depot_indices = self.biomass_df[
            (self.biomass_df['Latitude'] == depot_center[0]) &
            (self.biomass_df['Longitude'] == depot_center[1])
        ]['Index'].values
        depot_centers_indices[depot_center] = depot_indices

Then populate the distance matrices

    for i in range(self.n_sites):
        for j in range(self.num_depots):
            depot_indices = depot_centers_indices[tuple(self.depot_cluster_centers[j, :2])]
            self.dist_sites_to_depots[i, j] = distance_matrix.iloc[i, depot_indices].min()

    for j in range(self.num_depots):
        for k in range(self.num_biorefineries):
            refinery_indices = depot_centers_df[
                (depot_centers_df['Latitude'] == self.refinery_cluster_centers[k, 0]) &
                (depot_centers_df['Longitude'] == self.refinery_cluster_centers[k, 1])
            ].index
            self.dist_depots_to_refineries[j, k] = distance_matrix.iloc[depot_indices, refinery_indices].min()