Closed ebergam closed 3 years ago
@ebergam Oh, I think there seems to be a misunderstanding in slicing the timepoints evenly. It does not mean that all timepoints should have the same number of documents. Actually each timepoint can have different number of docs. Slicing the time evenly means that the spacing between each adjacent timepoint is the same. It is a major limitation of DTM because some models such as cDTM(continuous DTM) can accept arbitrary time intervals.
So, for example, if you use DTModel and add documents in 2000-2005 for t=0
and documents in 2005-2010 for t=1
, you have to add documents in 2010-2015 for t=2
.
Hi @bab2min , thanks a lot for your kind reply, now it's much clearer!
As I understand it, DTM model takes variable
t = [0, T)
number of timepoints, and slices the corpus evenly, according to the integer. Hence, withN
documents, each time period would includeN/T
documents. In this way, it's not possible to have more documents in periodt=1
, then there are int=2
.Am I misunderstanding?
If I am not, this could be much more flexible (especially for applied work) if it was possible to pass an array (or list) of
t
timepoints, whereT
is equal to the number of documents in the corpus, and eacht
indicates the respective time period of each document. In such a fashion, the DTModel could be easily applied to time-imbalanced datasets, which I believe represent a lot of real-world cases. What do you think?Thanks a lot