ReinV / SCOPE

Search and Chemical Ontology Plotting Environment
Other
1 stars 2 forks source link

Actual searches by year #7

Closed magnuspalmblad closed 4 years ago

magnuspalmblad commented 4 years ago

I just started the following runs, for the TF-IDF normalization:

pre1945, (FIRST_PDATE:[1000-01-01 TO 1944-12-31]) 1945, (FIRST_PDATE:[1945-01-01 TO 1945-12-31]) 1946, (FIRST_PDATE:[1946-01-01 TO 1946-12-31]) 1947, (FIRST_PDATE:[1947-01-01 TO 1947-12-31]) 1948, (FIRST_PDATE:[1948-01-01 TO 1948-12-31]) 1949, (FIRST_PDATE:[1949-01-01 TO 1949-12-31]) 1950, (FIRST_PDATE:[1950-01-01 TO 1950-12-31]) 1951, (FIRST_PDATE:[1951-01-01 TO 1951-12-31]) 1952, (FIRST_PDATE:[1952-01-01 TO 1952-12-31]) 1953, (FIRST_PDATE:[1953-01-01 TO 1953-12-31]) 1954, (FIRST_PDATE:[1954-01-01 TO 1954-12-31]) 1955, (FIRST_PDATE:[1955-01-01 TO 1955-12-31]) 1956, (FIRST_PDATE:[1956-01-01 TO 1956-12-31]) 1957, (FIRST_PDATE:[1957-01-01 TO 1957-12-31]) 1958, (FIRST_PDATE:[1958-01-01 TO 1958-12-31]) 1959, (FIRST_PDATE:[1959-01-01 TO 1959-12-31]) 1960, (FIRST_PDATE:[1960-01-01 TO 1960-12-31]) 1961, (FIRST_PDATE:[1961-01-01 TO 1961-12-31]) 1962, (FIRST_PDATE:[1962-01-01 TO 1962-12-31]) 1963, (FIRST_PDATE:[1963-01-01 TO 1963-12-31]) 1964, (FIRST_PDATE:[1964-01-01 TO 1964-12-31]) 1965, (FIRST_PDATE:[1965-01-01 TO 1965-12-31]) 1966, (FIRST_PDATE:[1966-01-01 TO 1966-12-31]) 1967, (FIRST_PDATE:[1967-01-01 TO 1967-12-31]) 1968, (FIRST_PDATE:[1968-01-01 TO 1968-12-31]) 1969, (FIRST_PDATE:[1969-01-01 TO 1969-12-31]) 1970, (FIRST_PDATE:[1970-01-01 TO 1970-12-31]) 1971, (FIRST_PDATE:[1971-01-01 TO 1971-12-31]) 1972, (FIRST_PDATE:[1972-01-01 TO 1972-12-31]) 1973, (FIRST_PDATE:[1973-01-01 TO 1973-12-31]) 1974, (FIRST_PDATE:[1974-01-01 TO 1974-12-31]) 1975, (FIRST_PDATE:[1975-01-01 TO 1975-12-31]) 1976, (FIRST_PDATE:[1976-01-01 TO 1976-12-31]) 1977, (FIRST_PDATE:[1977-01-01 TO 1977-12-31]) 1978, (FIRST_PDATE:[1978-01-01 TO 1978-12-31]) 1979, (FIRST_PDATE:[1979-01-01 TO 1979-12-31]) 1980, (FIRST_PDATE:[1980-01-01 TO 1980-12-31]) 1981, (FIRST_PDATE:[1981-01-01 TO 1981-12-31]) 1982, (FIRST_PDATE:[1982-01-01 TO 1982-12-31]) 1983, (FIRST_PDATE:[1983-01-01 TO 1983-12-31]) 1984, (FIRST_PDATE:[1984-01-01 TO 1984-12-31]) 1985, (FIRST_PDATE:[1985-01-01 TO 1985-12-31]) 1986, (FIRST_PDATE:[1986-01-01 TO 1986-12-31]) 1987, (FIRST_PDATE:[1987-01-01 TO 1987-12-31]) 1988, (FIRST_PDATE:[1988-01-01 TO 1988-12-31]) 1989, (FIRST_PDATE:[1989-01-01 TO 1989-12-31]) 1990, (FIRST_PDATE:[1990-01-01 TO 1990-12-31]) 1991, (FIRST_PDATE:[1991-01-01 TO 1991-12-31]) 1992, (FIRST_PDATE:[1992-01-01 TO 1992-12-31]) 1993, (FIRST_PDATE:[1993-01-01 TO 1993-12-31]) 1994, (FIRST_PDATE:[1994-01-01 TO 1994-12-31]) 1995, (FIRST_PDATE:[1995-01-01 TO 1995-12-31]) 1996, (FIRST_PDATE:[1996-01-01 TO 1996-12-31]) 1997, (FIRST_PDATE:[1997-01-01 TO 1997-12-31]) 1998, (FIRST_PDATE:[1998-01-01 TO 1998-12-31]) 1999, (FIRST_PDATE:[1999-01-01 TO 1999-12-31]) 2000, (FIRST_PDATE:[2000-01-01 TO 2000-12-31]) 2001, (FIRST_PDATE:[2001-01-01 TO 2001-12-31]) 2002, (FIRST_PDATE:[2002-01-01 TO 2002-12-31]) 2003, (FIRST_PDATE:[2003-01-01 TO 2003-12-31]) 2004, (FIRST_PDATE:[2004-01-01 TO 2004-12-31]) 2005, (FIRST_PDATE:[2005-01-01 TO 2005-12-31]) 2006, (FIRST_PDATE:[2006-01-01 TO 2006-12-31]) 2007, (FIRST_PDATE:[2007-01-01 TO 2007-12-31]) 2008, (FIRST_PDATE:[2008-01-01 TO 2008-12-31]) 2009, (FIRST_PDATE:[2009-01-01 TO 2009-12-31]) 2010, (FIRST_PDATE:[2010-01-01 TO 2010-12-31]) 2011, (FIRST_PDATE:[2011-01-01 TO 2011-12-31]) 2012, (FIRST_PDATE:[2012-01-01 TO 2012-12-31]) 2013, (FIRST_PDATE:[2013-01-01 TO 2013-12-31]) 2014, (FIRST_PDATE:[2014-01-01 TO 2014-12-31]) 2015, (FIRST_PDATE:[2015-01-01 TO 2015-12-31]) 2016, (FIRST_PDATE:[2016-01-01 TO 2016-12-31]) 2017, (FIRST_PDATE:[2017-01-01 TO 2017-12-31]) 2018, (FIRST_PDATE:[2018-01-01 TO 2018-12-31]) 2019, (FIRST_PDATE:[2019-01-01 TO 2019-12-31]) 2020, (FIRST_PDATE:[2020-01-01 TO 2020-12-31])

ReinV commented 4 years ago

With the search_query script or your own script?

magnuspalmblad commented 4 years ago

With search_query.py, so that the results are consistent.

The results may change over time, even for older literature, as the text mining is repeated and older material gets scanned and made available. But it would be good to have a recently retrieved background for TF-IDF for manuscript submission. Probably it is faster to read this in as a single static table, but if it does not take too long to generate it, it could be done every time from these searches_by_year results (makes it easier to update them, and make selections of specific years if wanted).

ReinV commented 4 years ago

With search_query.py, so that the results are consistent.

Ok. In that case I need to adjust the code of make_table.py because search_query.py saves results differently than the current searches_by_year files.

Probably it is faster to read this in as a single static table, but if it does not take too long to generate it, it could be done every time from these searches_by_year results (makes it easier to update them, and make selections of specific years if wanted).

I think it will take a while but if the make_table.py script is done for a whole folder of search results, than I think it is acceptable wait time.

magnuspalmblad commented 4 years ago

Right, I saved them as CSVs... But for consistency, I think it is best to use the same format for the searches by year as all other results. The existing files can be converted while waiting for the new searches to finish.

magnuspalmblad commented 4 years ago

Chugging along nicely - on 1963 now...

When I get a "connection failed" error - does the script retry, or will it miss some data then? I get this error occasionally.

ReinV commented 4 years ago

It will retry untill it retrieved the information, no information is lost, but I thought the script should send the "error" message anyways.

magnuspalmblad commented 4 years ago

Excellent!

magnuspalmblad commented 4 years ago

Now on 1988... As you mentioned, the searches will take longer and longer, the more results there are. They are now completed at a rate 1 year/day. If you want to start a sequence going from 2019 backwards, we can meet in the middle.

ReinV commented 4 years ago

Good idea, I will start the search.

magnuspalmblad commented 4 years ago

Searches are done up until April 2020. Will update the last few years, but closing the issue now.