Closed d-saikrishna closed 8 months ago
@shreyaagrawal0809 could you add the scraper code here: Link
With the output of the scraper code here: Link
I've added two codes:
flood_tenders.py
: Code which identifies flood_tenders out of all tenders and extracts relevant metadata. geocode_district.py
: Geo-code district of the tender.Will add one more code for geo-coding revenue circles tenders.
We are not able to geocode all districts. Have to decide methodology to deal with those tenders. Previously, we manually geocoded those tenders.
@d-saikrishna - let's have a call to discuss the geocoding part.
What percentage of tenders were you able to geocode ?
For ~3000/4000 tenders, District is identified. For another ~350 tenders, multiple districts are identified. These are 'CONFLICT' tenders.
For revenue circle identification, the number will be lesser. Will update
Need to decide on the following wrt geotagging tenders
CONFLICT
Districts after geotagging districts?location
column to geotag revenue circle?Meanwhile, I'm trying to expand the villages dataset by combining other sources so that there can be more absolute matches.
Number of tenders whose revenue circle could not be geo-tagged: 1192
This number can be reduced by:
location
columnCONFLICT
districtsNew logic for geo-tagging revenue circles:
tender_revenueci
column is based on title
, work description
and extReference ID
columnstender_revenueci_location
column is based on location
columntender_revenueci_location
column is a HeadQuarter, then we flag it accordingly in the HQ_flag
column.IF tender_revenueci_location
remains null
then RC in tender_revenueci
is decided as FINAL
IF HQ_Flag == False -- Then RC in tender_revenueci_location
is decided as FINAL
IF (HQ_Flag == True) AND (tender_revenueci_location
==tender_revenueci
) -- Then RC in tender_revenueci_location
is decided as FINAL
Yet to Decide IF (HQ_Flag == True) AND (tender_revenueci_location
!=tender_revenueci
)
Few decisions taken on TENDERS data source.
AOC
Tenders] Previously we took all tenders - even cancelled tenders. Accordingly, I'm scraping only AOC
tenders from the website.New tender stats accordingly. For the AOC tenders scraped between 2016 April to 2023 September:
Biswajeet: The geo-tagging exercise for 2023 tenders is complete. Total missing RCs - 84. RCs geotagged manually - 68. Tenders for which RCs cannot be determined - 16.
Should now create variables from tenders datasources
All variables for TENDERS processed until September 2023
Need to reclassify tenders:
Sub-thread of #8 for Assam Tenders
@shreyaagrawal0809 will provide the scraper scripts - updated with a column for
time of scraping
I will provide scripts for data transformation.