datasciencecampus / pygrams

Extracts key terminology (n-grams) from any large collection of documents (>1000) and forecasts emergence
https://datasciencecampus.github.io/pygrams
Other
62 stars 23 forks source link

CPC filter does not produce predictions #265

Closed IanGrimstead closed 4 years ago

IanGrimstead commented 5 years ago

When run with (for example) -cpc Y02, we fail to produce any predictions (message produced is Analysis of emergent failed as no terms were detected, likely because -mpq is too large for dataset provided).

Should produce sufficient terms for analysis! patstat scored 3,898 for solar panel which indicates > 3,898 patents

IanGrimstead commented 5 years ago

Top 250 terms:

1. solar cell                     3898.385468                                 
2. internal combustion engine     2725.165515
3. heat exchanger                 2692.492992
4. exhaust gas                    2573.558060
5. water tank                     2465.512626
6. solar energy                   2161.601373
7. fuel cell                      2005.073370
8. solar panel                    1748.711242
9. storage battery                1737.360645
10. carbon dioxide                 1510.974612
11. combustion chamber             1450.351037
12. solar water heater             1424.209037
13. wind turbine                   1408.786827
14. solar cell module              1386.212055
15. electric vehicle               1380.530256
16. solar battery                  1326.504379
17. power generation               1307.750727
18. solar cell panel               1199.961128
19. power consumption              1167.900741
20. electric energy                1150.989684
21. electric power                 1091.443106
22. hot water                      1070.387499
23. water pump                     1040.381805
24. electric motor                 1019.813735
25. power generator                1016.744179
26. negative electrode             990.101312
27. waste heat                     961.481585
28. water inlet                    954.354360
29. positive electrode             947.510116
30. power generation system        916.858230
31. high temperature               916.655595
32. wind power generation          912.875823
33. heat exchange                  907.218343
34. control circuit                905.240054
35. power source                   886.852957
36. wind power                     885.280292
37. independent claim              869.446219
38. energy consumption             858.139309
39. wind energy                    848.903794
40. flue gas                       848.893713
41. power generation device        833.844406
42. water storage tank             831.056698
43. air inlet                      828.236597
44. street lamp                    817.249118
45. temperature sensor             815.241718
46. .1                             797.782551
47. control module                 792.599527
48. water outlet                   789.709926
49. power grid                     782.778168
50. box body                       766.672540
51. water pipe                     759.908123
52. heat energy                    750.947605
53. secondary battery              750.113074
54. hybrid vehicle                 746.430042
55. photovoltaic cell              743.897428
56. wind power generator           729.921915
57. photovoltaic module            727.778673
58. waste water                    724.429621
59. organic fertilizer             708.403824
60. base station                   705.731004
61. heat collector                 698.792369
62. control signal                 690.487850
63. heat pump                      686.583436
64. battery pack                   656.607713
65. electrode active material      655.211742
66. heat source                    654.173761
67. exhaust pipe                   646.528290
68. storage tank                   640.998000
69. photovoltaic power generation  627.854949
70. output voltage                 627.536742
71. water flow                     627.354233
72. thin film                      620.135245
73. energy storage                 619.358398
74. electrical energy              616.832019
75. electrode layer                614.545755
76. solar collector                612.702082
77. fuel gas                       611.754600
78. heat pipe                      605.875689
79. air outlet                     605.599523
80. low temperature                604.213628
81. waste gas                      604.211095
82. tank body                      598.152685
83. heat storage                   597.367626
84. semiconductor layer            591.139233
85. control mean                   589.320214
86. generate power                 585.966472
87. wind wheel                     582.018147
88. generate electricity           574.275090
89. fly ash                        568.431479
90. supply power                   567.793909
91. high efficiency                567.159129
92. air flow                       567.090026
93. respectively connect           561.751477
94. furnace body                   560.749391
95. environmental protection       558.093089
96. power plant                    554.048313
97. air conditioner                553.527198
98. power system                   551.769234
99. rotary shaft                   545.392484
100. electronic device              540.617858
101. dye-sensitized solar cell      539.835236
102. direct current                 537.030295
103. output shaft                   535.062842
104. tail gas                       533.661235
105. battery cell                   531.309688
106. diesel engine                  527.978047
107. water turbine                  526.935513
108. mobile terminal                526.449319
109. water inlet pipe               519.129915
110. catalyst layer                 513.534057
111. water level                    509.428091
112. support frame                  508.946387
113. power converter                503.872951
114. main shaft                     503.172016
115. nutrient solution              501.456070
116. organic waste                  501.359895
117. control part                   497.698871
118. motor generator                495.929397
119. water heater                   495.042425
120. base material                  490.886602
121. heating device                 490.807583
122. motor vehicle                  487.697947
123. thermal energy                 487.365599
124. power generate                 480.886602
125. current collector              480.498466
126. flow rate                      480.250265
127. energy storage device          479.822555
128. solar module                   476.455115
129. outer wall                     473.539247
130. heat transfer                  470.100209
131. production process             469.839298
132. control valve                  469.598798
133. water outlet pipe              468.677926
134. lamp pole                      468.494090
135. energy source                  464.428650
136. high pressure                  462.117209
137. electric generator             461.440636
138. water supply                   460.044860
139. water quality                  458.332518
140. rotor blade                    458.221324
141. power supply system            454.121316
142. following raw material         453.115121
143. waste heat recovery            451.826319
144. intake air                     448.508742
145. lead lamp                      447.497470
146. photoelectric conversion element 444.808052
147. wind driven generator          444.313976
148. organic matter                 442.732698
149. intake valve                   442.181502
150. aqueous solution               441.767663
151. gas turbine                    439.605865
152. electric automobile            438.265424
153. energy saving                  437.555792
154. natural gas                    436.210361
155. real time                      434.895425
156. heat preservation              431.174451
157. output power                   430.538080
158. bottom plate                   425.602725
159. metal oxide                    423.739167
160. fan blade                      419.997983
161. arrange inside                 418.433772
162. rotating shaft                 417.906815
163. semiconductor substrate        417.796411
164. hydrogen gas                   417.010902
165. greatly reduce                 416.496457
166. reactive power                 414.410362
167. fuel injection                 413.404648
168. solar heat                     413.006954
169. permanent magnet               412.341079
170. composite material             410.517966
171. steam turbine                  410.491335
172. fermentation tank              406.836692
173. heating system                 406.806536
174. silicon wafer                  403.925780
175. electrical power               403.237074
176. wind speed                     401.861847
177. flow path                      400.443573
178. following component            399.974263
179. power output                   399.497556
180. power supply device            399.196648
181. battery module                 399.054859
182. gas outlet                     398.650948
183. fuel cell system               397.459313
184. utilization rate               396.301261
185. monitoring system              396.267645
186. greatly improve                395.903524
187. drive motor                    394.174167
188. rotation speed                 394.107614
189. waste material                 392.449389
190. power factor                   389.645491
191. lithium secondary battery      388.783108
192. fuel tank                      387.011802
193. vacuum tube                    386.678268
194. sea water                      384.245749
195. fuel supply                    382.201816
196. rotational speed               381.909216
197. electrolyte membrane           380.940992
198. cooling water                  379.117081
199. fresh water                    378.894018
200. save energy                    378.795208
201. base plate                     377.069280
202. hot air                        375.749095
203. energy conservation            373.681592
204. photoelectric conversion efficiency 373.350235
205. water body                     372.004220
206. ionic liquid                   371.909667
207. super capacitor                369.779579
208. glass substrate                369.445319
209. exhaust valve                  368.069327
210. organic solvent                367.616663
211. alternate current              366.837328
212. wave energy                    366.699840
213. cold water                     365.597726
214. drive shaft                    365.477588
215. exhaust passage                364.925142
216. heat collection                364.570925
217. low pressure                   364.191988
218. waste plastic                  363.736430
219. film solar cell                362.035711
220. buffer layer                   361.868401
221. thin film solar                361.837527
222. economic benefit               361.731333
223. heat medium                    360.176635
224. kinetic energy                 360.145219
225. communication module           360.122172
226. cross section                  359.275563
227. bottom surface                 359.088357
228. gas turbine engine             358.812078
229. carbon monoxide                358.542572
230. photovoltaic device            357.733627
231. power storage device           357.382233
232. electrode material             356.105336
233. work medium                    355.757720
234. electricity generation         353.865175
235. solenoid valve                 351.748377
236. input shaft                    351.266785
237. input voltage                  350.591268
238. drive circuit                  350.412270
239. drive force                    350.072149
240. inner surface                  349.524288
241. wall body                      349.515953
242. solar power generation         349.442624
243. die erfindung betrifft         349.205528
244. management system              348.324563
245. solar heat collector           348.311921
246. photoelectric conversion layer 347.680008
247. output terminal                346.646040
248. 1-2 part                       346.165695
249. electromagnetic valve          345.713553
thanasions commented 5 years ago

Is that still the case? I thought we solved this?

IanGrimstead commented 5 years ago

Checked on develop:

python pygrams.py -cpc Y02 -emt -dh publication_date -ds USPTO-random-100000.pkl.bz2

generated:

Running pipeline for "emergent"                                                 
Analysis of emergent failed as no terms were detected, likely because -mpq is too large for dataset provided
Traceback (most recent call last):
  File "pygrams.py", line 263, in <module>
    main(sys.argv[1:])
  File "pygrams.py", line 216, in main
    emergence=emergence)
TypeError: 'NoneType' object is not iterable

So alas something's still up - it should report gracefully if it failed, but I was hoping it'd get something with 100k patents...