Open Monireh2 opened 2 years ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:46Z ----------------------------------------------------------------
Web link to Cloud Pak for Data is not rendering properly on ReviewNB. Is there a typo in the Markdown?
Monireh2 commented on 2022-02-02T02:32:45Z ----------------------------------------------------------------
The link was working on my local machine and here in ReviewNB for me when I was clicking. I think it was not working because of the new line in the start of the url link. Just fixed it. Thanks for pointing that out.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:47Z ----------------------------------------------------------------
"We start" ==> "Allison starts"
Monireh2 commented on 2022-02-02T02:33:40Z ----------------------------------------------------------------
fixed, thanks!
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:48Z ----------------------------------------------------------------
There's no need to embed Python code to display the video. You can directly embed the video file into the Markdown in the previous cell. Syntax:
<video controls src="./images/Table_Understanding.mp4'">Creating a collection in IBM Watson Discovery</video>
Documentation here: https://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Working%20With%20Markdown%20Cells.html#Local-files
Overall the video looks good, but I do have some suggestions:
* You need to blur/black out the PII -- user names, people's names, account names. Instructions here: https://www.youtube.com/watch?v=54KYsEVJlWQ.
* If you have time to re-record the clip, I think it would work better if you shrunk the browser window to a smaller size and just recorded the window (Press command-shift-5 to select a portion of the screen to record).
* I recommend you edit out or speed up the parts where you're waiting for Discovery to perform an action.
Monireh2 commented on 2022-02-02T02:42:51Z ----------------------------------------------------------------
Thanks Fred for the pointer @frreiss. I actually tried to do so. But it will give me a black screen with the inactive play button. The only way I could resolve the issue was using the python snippet above. Regarding your other comments I will fix them.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:48Z ----------------------------------------------------------------
I think it would be better to move this cell and the ones that follow (up to the heading, "Query the project") to a separate notebook file to avoid breaking up the flow. You can put a hyperlink to the other notebook file directly into your Markdown, i.e.
For more information, refer to [this additional notebook](./other_notebook.ipynb)
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:49Z ----------------------------------------------------------------
Can you truncate this output a bit? Maybe print out the first 20 lines, followed by something like [200 more lines ]
?
Monireh2 commented on 2022-02-02T16:43:53Z ----------------------------------------------------------------
done!
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:50Z ----------------------------------------------------------------
This table is rendering as empty (no body cells) in ReviewNB.
Monireh2 commented on 2022-02-03T00:27:23Z ----------------------------------------------------------------
That is weird. It is rendering for me over my local machine.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:51Z ----------------------------------------------------------------
The data shown doesn't match the screenshot. The screenshot shows 2013-2014 data; the data here is for 2014-2014.
Monireh2 commented on 2022-02-03T00:28:36Z ----------------------------------------------------------------
Resolved. Had changed in final run!
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:51Z ----------------------------------------------------------------
Those error messages ('ERROR READING VALUE:"" Filling with <NA>
) shouldn't be there. Can you track down the root cause and open up a Github issue with code/data to reproduce
Monireh2 commented on 2022-02-03T00:43:15Z ----------------------------------------------------------------
@frreiss: The error does make sense to me. Whenever you get an empty value you are substitute the value with pd.NA and print the above error. I can open an issue on that if you think the code should get changed:
See line 229-231 here please:
https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/io/watson/tables.py
except ValueError: ans = pd.NA print(f"ERROR READING VALUE:\"{val}\"\t Filling with <NA>")
Here the value for "Major markets", "Growth Markets" and "BRIC countries" is empty.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:52Z ----------------------------------------------------------------
Several incorrect values are present in this table: "of intellectual property", "Licensing/royalty-based fees", "Custom development income", "2009. The increase in total expense and other", "Examples of the company's investments include:", NaN
, "Industry sales skills to support Smarter Planet".
These incorrect values most likely come from incorrect JSON input from Watson Discovery. Can you please trace these incorrect values back to the corresponding portions of the Watson Discovery output please? If there is a bug in Discovery, we should submit a bug report. If there's a bug in our Text Extensions for Pandas code it needs to be fixed.
Monireh2 commented on 2022-02-03T01:36:03Z ----------------------------------------------------------------
{'section_title': {'location': {'end': 627943, 'begin': 627925}, 'text': 'Geographic Revenue'}, 'row_headers': [{'column_index_begin': 0, 'row_index_begin': 0, 'location': {'end': 703825, 'begin': 703796}, 'text': 'Total consolidated research,', 'row_index_end': 0, 'cell_id': 'rowHeader-703796-703825', 'column_index_end': 0, 'text_normalized': 'Total consolidated research,'}, {'column_index_begin': 0, 'row_index_begin': 1, 'location': {'end': 704313, 'begin': 704285}, 'text': 'development and engineering', 'row_index_end': 1, 'cell_id': 'rowHeader-704285-704313', 'column_index_end': 0, 'text_normalized': 'development and engineering'}, {'column_index_begin': 0, 'row_index_begin': 2, 'location': {'end': 705414, 'begin': 705389}, 'text': 'Non-operating adjustment', 'row_index_end': 2, 'cell_id': 'rowHeader-705389-705414', 'column_index_end': 0, 'text_normalized': 'Non-operating adjustment'}, {'column_index_begin': 0, 'row_index_begin': 3, 'location': {'end': 705914, 'begin': 705881}, 'text': 'Non-operating retirement-related', 'row_index_end': 3, 'cell_id': 'rowHeader-705881-705914', 'column_index_end': 0, 'text_normalized': 'Non-operating retirement-related'}, {'column_index_begin': 0, 'row_index_begin': 4, 'location': {'end': 706394, 'begin': 706379}, 'text': '(costs)/income', 'row_index_end': 4, 'cell_id': 'rowHeader-706379-706394', 'column_index_end': 0, 'text_normalized': '(costs)/income'}, {'column_index_begin': 0, 'row_index_begin': 5, 'location': {'end': 707502, 'begin': 707471}, 'text': 'Operating (non-GAAP) research,', 'row_index_end': 5, 'cell_id': 'rowHeader-707471-707502', 'column_index_end': 0, 'text_normalized': 'Operating (non-GAAP) research,'}, {'column_index_begin': 0, 'row_index_begin': 6, 'location': {'end': 707990, 'begin': 707962}, 'text': 'development and engineering', 'row_index_end': 6, 'cell_id': 'rowHeader-707962-707990', 'column_index_end': 0, 'text_normalized': 'development and engineering'}], 'table_headers': [], 'location': {'end': 708798, 'begin': 703796}, 'text': 'Total consolidated research, development and engineering $5,247 $5,437 (3.5)%\nNon-operating adjustment\n Non-operating retirement-related (costs)/income (48) 77 NM\nOperating (non-GAAP) research,\n development and engineering $5,200 $5,514 (5.7)%\n', 'body_cells': [{'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 1, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 703903, 'begin': 703902}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-703902-703903'}, {'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 2, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 703968, 'begin': 703967}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-703967-703968'}, {'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 3, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704033, 'begin': 704032}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-704032-704033'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 1, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704581, 'begin': 704574}, 'attributes': [{'location': {'end': 704580, 'begin': 704574}, 'text': '$5,247', 'type': 'Currency'}], 'text': '$5,247', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-704574-704581'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 2, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704848, 'begin': 704841}, 'attributes': [{'location': {'end': 704847, 'begin': 704841}, 'text': '$5,437', 'type': 'Currency'}], 'text': '$5,437', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-704841-704848'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 3, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705118, 'begin': 705111}, 'attributes': [{'location': {'end': 705115, 'begin': 705112}, 'text': '3.5', 'type': 'Number'}], 'text': '(3.5)%', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-705111-705118'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 1, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705492, 'begin': 705491}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705491-705492'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 2, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705557, 'begin': 705556}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705556-705557'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 3, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705622, 'begin': 705621}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705621-705622'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 1, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705992, 'begin': 705991}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-705991-705992'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 2, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706057, 'begin': 706056}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-706056-706057'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 3, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706122, 'begin': 706121}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-706121-706122'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 1, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706662, 'begin': 706657}, 'attributes': [{'location': {'end': 706660, 'begin': 706658}, 'text': '48', 'type': 'Number'}], 'text': '(48)', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-706657-706662'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 2, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706929, 'begin': 706926}, 'attributes': [{'location': {'end': 706928, 'begin': 706926}, 'text': '77', 'type': 'Number'}], 'text': '77', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-706926-706929'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 3, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707196, 'begin': 707193}, 'attributes': [], 'text': 'NM', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-707193-707196'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 1, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707580, 'begin': 707579}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707579-707580'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 2, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707645, 'begin': 707644}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707644-707645'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 3, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707710, 'begin': 707709}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707709-707710'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 1, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708259, 'begin': 708252}, 'attributes': [{'location': {'end': 708258, 'begin': 708252}, 'text': '$5,200', 'type': 'Currency'}], 'text': '$5,200', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708252-708259'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 2, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708527, 'begin': 708520}, 'attributes': [{'location': {'end': 708526, 'begin': 708520}, 'text': '$5,514', 'type': 'Currency'}], 'text': '$5,514', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708520-708527'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 3, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708798, 'begin': 708791}, 'attributes': [{'location': {'end': 708795, 'begin': 708792}, 'text': '5.7', 'type': 'Number'}], 'text': '(5.7)%', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708791-708798'}], 'contexts': [{'location': {'end': 702649, 'begin': 702242}, 'text': 'For the year ended December 31: 2015 2014'}, {'location': {'end': 702865, 'begin': 702855}, 'text': 'Yr.-to-Yr.'}, {'location': {'end': 703240, 'begin': 703046}, 'text': 'Percent\nChange'}, {'location': {'end': 709029, 'begin': 709012}, 'text': 'NM-Not meaningful'}, {'location': {'end': 709287, 'begin': 709227}, 'text': 'Research, development and engineering (RD&E) expense was'}, {'location': {'end': 709775, 'begin': 709521}, 'text': '6.4 percent of revenue in 2015 and 5.9 percent of revenue in 2014.'}, {'location': {'end': 710189, 'begin': 709945}, 'text': 'RD&E expense decreased 3.5 percent in 2015 versus 2014 primarily driven by:'}], 'key_value_pairs': [{'value': [{'location': {'end': 704580, 'begin': 704574}, 'text': '$5,247', 'cell_id': 'bodyCell-704574-704581'}], 'key': {'location': {'end': 704312, 'begin': 704285}, 'text': 'development and engineering', 'cell_id': 'rowHeader-704285-704313'}}, {'value': [{'location': {'end': 708258, 'begin': 708252}, 'text': '$5,200', 'cell_id': 'bodyCell-708252-708259'}], 'key': {'location': {'end': 707989, 'begin': 707962}, 'text': 'development and engineering', 'cell_id': 'rowHeader-707962-707990'}}], 'title': {}, 'column_headers': []}, {'section_title': {'location': {'end': 627943, 'begin': 627925}, 'text': 'Geographic Revenue'}, 'row_headers': [{'column_index_begin': 0, 'row_index_begin': 0, 'location': {'end': 714975, 'begin': 714949}, 'text': 'Sales and other transfers', 'row_index_end': 0, 'cell_id': 'rowHeader-714949-714975', 'column_index_end': 0, 'text_normalized': 'Sales and other transfers'}, {'column_index_begin': 0, 'row_index_begin': 1, 'location': {'end': 715466, 'begin': 715441}, 'text': 'of intellectual property', 'row_index_end': 1, 'cell_id': 'rowHeader-715441-715466', 'column_index_end': 0, 'text_normalized': 'of intellectual property'}, {'column_index_begin': 0, 'row_index_begin': 2, 'location': {'end': 716567, 'begin': 716538}, 'text': 'Licensing/royalty-based fees', 'row_index_end': 2, 'cell_id': 'rowHeader-716538-716567', 'column_index_end': 0, 'text_normalized': 'Licensing/royalty-based fees'}, {'column_index_begin': 0, 'row_index_begin': 3, 'location': {'end': 717670, 'begin': 717644}, 'text': 'Custom development income', 'row_index_end': 3, 'cell_id': 'rowHeader-717644-717670', 'column_index_end': 0, 'text_normalized': 'Custom development income'}, {'column_index_begin': 0, 'row_index_begin': 4, 'location': {'end': 718753, 'begin': 718747}, 'text': 'Total', 'row_index_end': 4, 'cell_id': 'rowHeader-718747-718753', 'column_index_end': 0, 'text_normalized': 'Total'}], 'table_headers': [], 'location': {'end': 719555, 'begin': 714949}, 'text': 'Sales and other transfers of intellectual property $303 $283 7.1%\nLicensing/royalty-based fees 117 129 (9.8)\nCustom development income 262 330 (20.5)\nTotal $682 $742 (8.1)%\n', 'body_cells': [{'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 1, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715053, 'begin': 715052}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715052-715053'}, {'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 2, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715118, 'begin': 715117}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715117-715118'}, {'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 3, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715183, 'begin': 715182}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715182-715183'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 1, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715734, 'begin': 715729}, 'attributes': [{'location': {'end': 715733, 'begin': 715729}, 'text': '$303', 'type': 'Currency'}], 'text': '$303', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-715729-715734'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 2, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716001, 'begin': 715996}, 'attributes': [{'location': {'end': 716000, 'begin': 715996}, 'text': '$283', 'type': 'Currency'}], 'text': '$283', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-715996-716001'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 3, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716269, 'begin': 716264}, 'attributes': [{'location': {'end': 716268, 'begin': 716264}, 'text': '7.1%', 'type': 'Percentage'}], 'text': '7.1%', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-716264-716269'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 1, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716835, 'begin': 716831}, 'attributes': [{'location': {'end': 716834, 'begin': 716831}, 'text': '117', 'type': 'Number'}], 'text': '117', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-716831-716835'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 2, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717104, 'begin': 717100}, 'attributes': [{'location': {'end': 717103, 'begin': 717100}, 'text': '129', 'type': 'Number'}], 'text': '129', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-717100-717104'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 3, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717374, 'begin': 717368}, 'attributes': [{'location': {'end': 717372, 'begin': 717369}, 'text': '9.8', 'type': 'Number'}], 'text': '(9.8)', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-717368-717374'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 1, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717939, 'begin': 717935}, 'attributes': [{'location': {'end': 717938, 'begin': 717935}, 'text': '262', 'type': 'Number'}], 'text': '262', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-717935-717939'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 2, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 718208, 'begin': 718204}, 'attributes': [{'location': {'end': 718207, 'begin': 718204}, 'text': '330', 'type': 'Number'}], 'text': '330', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-718204-718208'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 3, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 718473, 'begin': 718466}, 'attributes': [{'location': {'end': 718471, 'begin': 718467}, 'text': '20.5', 'type': 'Number'}], 'text': '(20.5)', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-718466-718473'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 1, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719022, 'begin': 719017}, 'attributes': [{'location': {'end': 719021, 'begin': 719017}, 'text': '$682', 'type': 'Currency'}], 'text': '$682', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719017-719022'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 2, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719287, 'begin': 719282}, 'attributes': [{'location': {'end': 719286, 'begin': 719282}, 'text': '$742', 'type': 'Currency'}], 'text': '$742', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719282-719287'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 3, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719555, 'begin': 719548}, 'attributes': [{'location': {'end': 719552, 'begin': 719549}, 'text': '8.1', 'type': 'Number'}], 'text': '(8.1)%', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719548-719555'}], 'contexts': [{'location': {'end': 713812, 'begin': 713406}, 'text': 'For the year ended December 31: 2015 2014'}, {'location': {'end': 714027, 'begin': 714017}, 'text': 'Yr.-to-Yr.'}, {'location': {'end': 714400, 'begin': 714207}, 'text': 'Percent Change'}, {'location': {'end': 720499, 'begin': 719763}, 'text': 'The timing and amount of Sales and other transfers of IP may vary significantly from period to period depending upon the timing of divestitures, economic conditions, industry consolidation and the timing of new patents and know-how development.'}, {'location': {'end': 720730, 'begin': 720500}, 'text': 'There were no material individual IP transactions in 2015 or 2014.'}, {'location': {'end': 720953, 'begin': 720927}, 'text': 'Other (Income) and Expense'}], 'key_value_pairs': [{'value': [{'location': {'end': 719021, 'begin': 719017}, 'text': '$682', 'cell_id': 'bodyCell-719017-719022'}], 'key': {'location': {'end': 718752, 'begin': 718747}, 'text': 'Total', 'cell_id': 'rowHeader-718747-718753'}}], 'title': {}, 'column_headers': []},
If you look at the json you can see it covers pther tables under Geographic Revenue section title as well: please check page 56-57 in IBM_Annual_Report_2015. I would say it is neither error with Text Extension for Pandas nor with WD.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:53Z ----------------------------------------------------------------
This table contains more duplicates than it did before. Why is that happening? Is the latest version of Watson Discovery returning multiple copies of the same table?
Monireh2 commented on 2022-02-03T21:25:50Z ----------------------------------------------------------------
I checked this carefully and you can see for example for 2012-2011 we have two geographic revenue tables and that is the same for 2011-2010. So we will have 4 values for America for 2011. You just need to search for 44,944 to validate this. Checking that in the IBM_Annual_Report_2012.pdf I can see two Geographic Revenues tables one for 2012-2011 and another for 2011-2010 which explains why we have 4 values for each region for each year and Watson Discovery has listed both tables for each document correctly.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2022-02-01T00:39:54Z ----------------------------------------------------------------
Data from 2018 and 2019 is no longer here. What happened to it?
Monireh2 commented on 2022-02-04T00:36:56Z ----------------------------------------------------------------
checking why the data from 2019.pdf has not processed; I can see some results has been returned by WD for 2019.pdf!
Monireh2 commented on 2022-02-10T23:12:12Z ----------------------------------------------------------------
The column_header_texts for the 2018-2019 table is empty from WD discovery's json output, that is why we were not retain the rows for 2019-2018 table:
"column_header_texts": [ "", "", "", "", "" ],
text row_header_texts_0 column_header_texts attributes.type value 0 2019 For the year ended December 31: [DateTime] 2019 1 2018 For the year ended December 31: [Number] 2018
Monireh2 commented on 2022-02-10T23:14:39Z ----------------------------------------------------------------
I am just changing the retaining condition or copy the from the text column into the column_header_texts when the text follows the \d4 regex pattern to include the 2018-2019 info as well.
Monireh2 commented on 2022-02-11T18:02:11Z ----------------------------------------------------------------
Created an issue with the Discovery's team: https://github.ibm.com/Watson-Discovery/disco-issue-tracker/issues/10974
The link was working on my local machine and here in ReviewNB for me when I was clicking. I think it was not working because of the new line in the start of the url link. Just fixed it. Thanks for pointing that out.
View entire conversation on ReviewNB
Thanks Fred for the pointer @frreiss. I actually tried to do so. But it will give me a black screen with the inactive play button. The only way I could resolve the issue was using the python snippet above. Regarding your other comments I will fix them.
View entire conversation on ReviewNB
@frreiss: The error does make sense to me. Whenever you get an empty value you are substitute the value with pd.NA and print the above error. I can open an issue on that if you think the code should get changed:
See line 229-231 here please:
https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/io/watson/tables.py
except ValueError: ans = pd.NA print(f"ERROR READING VALUE:\"{val}\"\t Filling with <NA>")
View entire conversation on ReviewNB
{'section_title': {'location': {'end': 627943, 'begin': 627925}, 'text': 'Geographic Revenue'}, 'row_headers': [{'column_index_begin': 0, 'row_index_begin': 0, 'location': {'end': 703825, 'begin': 703796}, 'text': 'Total consolidated research,', 'row_index_end': 0, 'cell_id': 'rowHeader-703796-703825', 'column_index_end': 0, 'text_normalized': 'Total consolidated research,'}, {'column_index_begin': 0, 'row_index_begin': 1, 'location': {'end': 704313, 'begin': 704285}, 'text': 'development and engineering', 'row_index_end': 1, 'cell_id': 'rowHeader-704285-704313', 'column_index_end': 0, 'text_normalized': 'development and engineering'}, {'column_index_begin': 0, 'row_index_begin': 2, 'location': {'end': 705414, 'begin': 705389}, 'text': 'Non-operating adjustment', 'row_index_end': 2, 'cell_id': 'rowHeader-705389-705414', 'column_index_end': 0, 'text_normalized': 'Non-operating adjustment'}, {'column_index_begin': 0, 'row_index_begin': 3, 'location': {'end': 705914, 'begin': 705881}, 'text': 'Non-operating retirement-related', 'row_index_end': 3, 'cell_id': 'rowHeader-705881-705914', 'column_index_end': 0, 'text_normalized': 'Non-operating retirement-related'}, {'column_index_begin': 0, 'row_index_begin': 4, 'location': {'end': 706394, 'begin': 706379}, 'text': '(costs)/income', 'row_index_end': 4, 'cell_id': 'rowHeader-706379-706394', 'column_index_end': 0, 'text_normalized': '(costs)/income'}, {'column_index_begin': 0, 'row_index_begin': 5, 'location': {'end': 707502, 'begin': 707471}, 'text': 'Operating (non-GAAP) research,', 'row_index_end': 5, 'cell_id': 'rowHeader-707471-707502', 'column_index_end': 0, 'text_normalized': 'Operating (non-GAAP) research,'}, {'column_index_begin': 0, 'row_index_begin': 6, 'location': {'end': 707990, 'begin': 707962}, 'text': 'development and engineering', 'row_index_end': 6, 'cell_id': 'rowHeader-707962-707990', 'column_index_end': 0, 'text_normalized': 'development and engineering'}], 'table_headers': [], 'location': {'end': 708798, 'begin': 703796}, 'text': 'Total consolidated research, development and engineering $5,247 $5,437 (3.5)%\nNon-operating adjustment\n Non-operating retirement-related (costs)/income (48) 77 NM\nOperating (non-GAAP) research,\n development and engineering $5,200 $5,514 (5.7)%\n', 'body_cells': [{'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 1, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 703903, 'begin': 703902}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-703902-703903'}, {'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 2, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 703968, 'begin': 703967}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-703967-703968'}, {'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 3, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704033, 'begin': 704032}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-704032-704033'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 1, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704581, 'begin': 704574}, 'attributes': [{'location': {'end': 704580, 'begin': 704574}, 'text': '$5,247', 'type': 'Currency'}], 'text': '$5,247', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-704574-704581'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 2, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704848, 'begin': 704841}, 'attributes': [{'location': {'end': 704847, 'begin': 704841}, 'text': '$5,437', 'type': 'Currency'}], 'text': '$5,437', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-704841-704848'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 3, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705118, 'begin': 705111}, 'attributes': [{'location': {'end': 705115, 'begin': 705112}, 'text': '3.5', 'type': 'Number'}], 'text': '(3.5)%', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-705111-705118'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 1, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705492, 'begin': 705491}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705491-705492'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 2, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705557, 'begin': 705556}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705556-705557'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 3, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705622, 'begin': 705621}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705621-705622'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 1, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705992, 'begin': 705991}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-705991-705992'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 2, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706057, 'begin': 706056}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-706056-706057'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 3, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706122, 'begin': 706121}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-706121-706122'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 1, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706662, 'begin': 706657}, 'attributes': [{'location': {'end': 706660, 'begin': 706658}, 'text': '48', 'type': 'Number'}], 'text': '(48)', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-706657-706662'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 2, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706929, 'begin': 706926}, 'attributes': [{'location': {'end': 706928, 'begin': 706926}, 'text': '77', 'type': 'Number'}], 'text': '77', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-706926-706929'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 3, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707196, 'begin': 707193}, 'attributes': [], 'text': 'NM', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-707193-707196'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 1, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707580, 'begin': 707579}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707579-707580'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 2, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707645, 'begin': 707644}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707644-707645'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 3, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707710, 'begin': 707709}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707709-707710'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 1, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708259, 'begin': 708252}, 'attributes': [{'location': {'end': 708258, 'begin': 708252}, 'text': '$5,200', 'type': 'Currency'}], 'text': '$5,200', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708252-708259'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 2, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708527, 'begin': 708520}, 'attributes': [{'location': {'end': 708526, 'begin': 708520}, 'text': '$5,514', 'type': 'Currency'}], 'text': '$5,514', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708520-708527'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 3, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708798, 'begin': 708791}, 'attributes': [{'location': {'end': 708795, 'begin': 708792}, 'text': '5.7', 'type': 'Number'}], 'text': '(5.7)%', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708791-708798'}], 'contexts': [{'location': {'end': 702649, 'begin': 702242}, 'text': 'For the year ended December 31: 2015 2014'}, {'location': {'end': 702865, 'begin': 702855}, 'text': 'Yr.-to-Yr.'}, {'location': {'end': 703240, 'begin': 703046}, 'text': 'Percent\nChange'}, {'location': {'end': 709029, 'begin': 709012}, 'text': 'NM-Not meaningful'}, {'location': {'end': 709287, 'begin': 709227}, 'text': 'Research, development and engineering (RD&E) expense was'}, {'location': {'end': 709775, 'begin': 709521}, 'text': '6.4 percent of revenue in 2015 and 5.9 percent of revenue in 2014.'}, {'location': {'end': 710189, 'begin': 709945}, 'text': 'RD&E expense decreased 3.5 percent in 2015 versus 2014 primarily driven by:'}], 'key_value_pairs': [{'value': [{'location': {'end': 704580, 'begin': 704574}, 'text': '$5,247', 'cell_id': 'bodyCell-704574-704581'}], 'key': {'location': {'end': 704312, 'begin': 704285}, 'text': 'development and engineering', 'cell_id': 'rowHeader-704285-704313'}}, {'value': [{'location': {'end': 708258, 'begin': 708252}, 'text': '$5,200', 'cell_id': 'bodyCell-708252-708259'}], 'key': {'location': {'end': 707989, 'begin': 707962}, 'text': 'development and engineering', 'cell_id': 'rowHeader-707962-707990'}}], 'title': {}, 'column_headers': []}, {'section_title': {'location': {'end': 627943, 'begin': 627925}, 'text': 'Geographic Revenue'}, 'row_headers': [{'column_index_begin': 0, 'row_index_begin': 0, 'location': {'end': 714975, 'begin': 714949}, 'text': 'Sales and other transfers', 'row_index_end': 0, 'cell_id': 'rowHeader-714949-714975', 'column_index_end': 0, 'text_normalized': 'Sales and other transfers'}, {'column_index_begin': 0, 'row_index_begin': 1, 'location': {'end': 715466, 'begin': 715441}, 'text': 'of intellectual property', 'row_index_end': 1, 'cell_id': 'rowHeader-715441-715466', 'column_index_end': 0, 'text_normalized': 'of intellectual property'}, {'column_index_begin': 0, 'row_index_begin': 2, 'location': {'end': 716567, 'begin': 716538}, 'text': 'Licensing/royalty-based fees', 'row_index_end': 2, 'cell_id': 'rowHeader-716538-716567', 'column_index_end': 0, 'text_normalized': 'Licensing/royalty-based fees'}, {'column_index_begin': 0, 'row_index_begin': 3, 'location': {'end': 717670, 'begin': 717644}, 'text': 'Custom development income', 'row_index_end': 3, 'cell_id': 'rowHeader-717644-717670', 'column_index_end': 0, 'text_normalized': 'Custom development income'}, {'column_index_begin': 0, 'row_index_begin': 4, 'location': {'end': 718753, 'begin': 718747}, 'text': 'Total', 'row_index_end': 4, 'cell_id': 'rowHeader-718747-718753', 'column_index_end': 0, 'text_normalized': 'Total'}], 'table_headers': [], 'location': {'end': 719555, 'begin': 714949}, 'text': 'Sales and other transfers of intellectual property $303 $283 7.1%\nLicensing/royalty-based fees 117 129 (9.8)\nCustom development income 262 330 (20.5)\nTotal $682 $742 (8.1)%\n', 'body_cells': [{'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 1, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715053, 'begin': 715052}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715052-715053'}, {'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 2, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715118, 'begin': 715117}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715117-715118'}, {'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 3, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715183, 'begin': 715182}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715182-715183'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 1, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715734, 'begin': 715729}, 'attributes': [{'location': {'end': 715733, 'begin': 715729}, 'text': '$303', 'type': 'Currency'}], 'text': '$303', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-715729-715734'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 2, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716001, 'begin': 715996}, 'attributes': [{'location': {'end': 716000, 'begin': 715996}, 'text': '$283', 'type': 'Currency'}], 'text': '$283', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-715996-716001'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 3, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716269, 'begin': 716264}, 'attributes': [{'location': {'end': 716268, 'begin': 716264}, 'text': '7.1%', 'type': 'Percentage'}], 'text': '7.1%', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-716264-716269'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 1, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716835, 'begin': 716831}, 'attributes': [{'location': {'end': 716834, 'begin': 716831}, 'text': '117', 'type': 'Number'}], 'text': '117', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-716831-716835'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 2, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717104, 'begin': 717100}, 'attributes': [{'location': {'end': 717103, 'begin': 717100}, 'text': '129', 'type': 'Number'}], 'text': '129', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-717100-717104'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 3, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717374, 'begin': 717368}, 'attributes': [{'location': {'end': 717372, 'begin': 717369}, 'text': '9.8', 'type': 'Number'}], 'text': '(9.8)', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-717368-717374'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 1, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717939, 'begin': 717935}, 'attributes': [{'location': {'end': 717938, 'begin': 717935}, 'text': '262', 'type': 'Number'}], 'text': '262', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-717935-717939'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 2, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 718208, 'begin': 718204}, 'attributes': [{'location': {'end': 718207, 'begin': 718204}, 'text': '330', 'type': 'Number'}], 'text': '330', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-718204-718208'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 3, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 718473, 'begin': 718466}, 'attributes': [{'location': {'end': 718471, 'begin': 718467}, 'text': '20.5', 'type': 'Number'}], 'text': '(20.5)', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-718466-718473'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 1, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719022, 'begin': 719017}, 'attributes': [{'location': {'end': 719021, 'begin': 719017}, 'text': '$682', 'type': 'Currency'}], 'text': '$682', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719017-719022'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 2, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719287, 'begin': 719282}, 'attributes': [{'location': {'end': 719286, 'begin': 719282}, 'text': '$742', 'type': 'Currency'}], 'text': '$742', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719282-719287'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 3, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719555, 'begin': 719548}, 'attributes': [{'location': {'end': 719552, 'begin': 719549}, 'text': '8.1', 'type': 'Number'}], 'text': '(8.1)%', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719548-719555'}], 'contexts': [{'location': {'end': 713812, 'begin': 713406}, 'text': 'For the year ended December 31: 2015 2014'}, {'location': {'end': 714027, 'begin': 714017}, 'text': 'Yr.-to-Yr.'}, {'location': {'end': 714400, 'begin': 714207}, 'text': 'Percent Change'}, {'location': {'end': 720499, 'begin': 719763}, 'text': 'The timing and amount of Sales and other transfers of IP may vary significantly from period to period depending upon the timing of divestitures, economic conditions, industry consolidation and the timing of new patents and know-how development.'}, {'location': {'end': 720730, 'begin': 720500}, 'text': 'There were no material individual IP transactions in 2015 or 2014.'}, {'location': {'end': 720953, 'begin': 720927}, 'text': 'Other (Income) and Expense'}], 'key_value_pairs': [{'value': [{'location': {'end': 719021, 'begin': 719017}, 'text': '$682', 'cell_id': 'bodyCell-719017-719022'}], 'key': {'location': {'end': 718752, 'begin': 718747}, 'text': 'Total', 'cell_id': 'rowHeader-718747-718753'}}], 'title': {}, 'column_headers': []},
If you look at the json you can see it covers multiple tables under Geographic Revenue section: please check page 41-43 in IBM_Annual_Report_2016
--- View entire conversation on ReviewNB
I checked this carefully and you can see for example for 2012-2011 we have two geographic revenue tables and that is the same for 2011-2010. So we will have 4 values for America for 2011. You just need to search for 44,944 to validate this. Checking that in the IBM_Annual_Report_2012.pdf I can see two Geographic Revenues tables one for 2012-2011 and another for 2011-2010 which explains why we have 4 values for each region for each year and Watson Discovery has listed both tables for each document correctly.
View entire conversation on ReviewNB
checking why the data from 2019.pdf has not processed; I can see some results has been returned by WD for 2019.pdf!
View entire conversation on ReviewNB
The column_header_texts for the 2018-2019 table is empty from WD discovery's json output, that is why we were not retain the rows for 2019-2018 table:
"column_header_texts": [ "", "", "", "", "" ],
text row_header_texts_0 column_header_texts attributes.type value 0 2019 For the year ended December 31: [DateTime] 2019 1 2018 For the year ended December 31: [Number] 2018
View entire conversation on ReviewNB
I am just changing the retaining condition or copy the from the text column into the column_header_texts when the text follows the \d4 regex pattern to include the 2018-2019 info as well.
View entire conversation on ReviewNB
Created an issue with the Discovery's team: https://github.ibm.com/Watson-Discovery/disco-issue-tracker/issues/10974
View entire conversation on ReviewNB
Made the table understanding end to end using the WD python SDK, included a video tutorial to show how someone can use the WD tooling up to querying the project.