SWS-Methodology / faoswsTrade

World trade data processing for the FAO Statistical Working System
http://www.fao.org/economic/ess/ess-home/en/
5 stars 2 forks source link

Create Shiny app to track full process of HS->FCL mapping for individual items #126

Open malexan opened 7 years ago

malexan commented 7 years ago

Now if we want to understand roots of mapping problems for specific items we need to jump between different csv-files. It would be very handy to have Shiny web-app what allows to quickly switch between different stages of mapping process for individual items.

Technical question is where to host the app for users from B/C team.

malexan commented 7 years ago

Let's say we want to check situation with Canada #78.

  1. File tldata_fulldata_nolinks_byreporter.csv shows total number of unmatched records (1436), its proportion (5.76%), total value of unmatched records (3325547462), value proportion (4%).
  2. tldata_fulldata_nolinks_byreporterhs6.csv after filtering by Canada and sorting by total value of unmatched records shows 38 different HS6 codes not mapped to FCL. Each HS6 code can contain many HS+ codes. The unmatched HS6 code with the biggest value is 190590. It is included in 103 records or in 55% of all records matched this HS6. Value under this unmatched HS6 is 1112168931, what is 50% of all value under the HS6 code.
  3. tldata_hsfcl_nolinks.csv shows one link record with HS 190590 (id 118). It is located in flow 1 and it was extended to 1905900000. Its presence in this table means that is wasn't suitable for mapping on HS6 level and it was passed to traditional HS+ approach.
  4. Now we want to check the mapping table. We don't have suitable report, so we load the table in R. For Canada flow 1 there are 91 correspondence records what start from 190590. 78 records point to fcl 22, 12 - to fcl 111, and one record points to fcl 20. This multi-correspondence denies the HS6 code to be run by HS6 approach, so it is run by traditional HS+ approach and it is failed to match.

Conclusions on the case:

  1. Technically current mapping algorithms work as expected and without errors. They are not able to map this HS code due to specifics of raw trade data.
  2. The case could be solved by requesting more detailed Canada data, as was suggested by Katherina. Either it could be solved by introducing of probabilistic approach (HS 190590 goes to FCL 22, because most of such HS6 codes point to it).

Conclusion on the Shiny app development:

  1. All these operations now carried manually could be implemented in Shiny app and save working time of developers and data clerks.
carola-f commented 7 years ago

The application should display item labels extracted from the WB database, to help users map items correctly to the FCL/CPC

chrMongeau commented 7 years ago

The application should display item labels extracted from the WB database

For reference: http://wits.worldbank.org/data/public/CMTTL_TarifflineDescription.zip