bnosac / doc2vec

Distributed Representations of Sentences and Documents
Other
46 stars 5 forks source link

Error: Too many files are open #18

Closed jusme326 closed 3 years ago

jusme326 commented 3 years ago

library(doc2vec) library(stringr)

text <- c("We gather tonight knowing that this generation of heroes has made the United States safer and more respected around the world. For the first time in nine years, there are no Americans fighting in Iraq. (Applause.) For the first time in two decades, Osama bin Laden is not a threat to this country. (Applause.) Most of al Qaeda's top lieutenants have been defeated. The Taliban's momentum has been broken, and some troops in Afghanistan have begun to come home. These achievements are a testament to the courage, selflessness and teamwork of America's Armed Forces. At a time when too many of our institutions have let us down, they exceed all expectations. They're not consumed with personal ambition. They don't obsess over their differences. They focus on the mission at hand. They work together. Imagine what we could accomplish if we followed their example. (Applause.) Think about the America within our reach A country that leads the world in educating its people. An America that attracts a new generation of high-tech manufacturing and high-paying jobs. A future where we're in control of our own energy, and our security and prosperity aren't so tied to unstable parts of the world. An economy built to last, where hard work pays off, and responsibility is rewarded. We can do this. I know we can, because we've done it before. At the end of World War II, when another generation of heroes returned home from combat, they built the strongest economy and middle class the world has ever known. My grandfather, a veteran of Patton's Army, got the chance to go to college on the GI Bill. My grandmother, who worked on a bomber assembly line, was part of a workforce that turned out the best products on Earth. The two of them shared the optimism of a nation that had triumphed over a depression and fascism. They understood they were part of something larger; that they were contributing to a story of success that every American had a chance to share -- the basic American promise that if you worked hard, you could do well enough to raise a family, own a home, send your kids to college, and put a little away for retirement. The defining issue of our time is how to keep that promise alive. No challenge is more urgent. No debate is more important. We can either settle for a country where a shrinking number of people do really well while a growing number of Americans barely get by, or we can restore an economy where everyone gets a fair shot, and everyone does their fair share, and everyone plays by the same set of rules. What's at stake aren't Democratic values or Republican values, but American values. And we have to reclaim them. Let's remember how we got here. Long before the recession, jobs and manufacturing began leaving our shores. Technology made businesses more efficient, but also made some jobs obsolete. Folks at the top saw their incomes rise like never before, but most hardworking Americans struggled with costs that were growing, paychecks that weren't, and personal debt that kept piling up. In 2008, the house of cards collapsed. We learned that mortgages had been sold to people who couldn't afford or understand them. Banks had made huge bets and bonuses with other people's money. Regulators had looked the other way, or didn't have the authority to stop the bad behavior. It was wrong. It was irresponsible. And it plunged our economy into a crisis that put millions out of work, saddled us with more debt, and left innocent, hardworking Americans holding the bag. In the six months before I took office, we lost nearly 4 million jobs. And we lost another 4 million before our policies were in full effect. Those are the facts. But so are these: In the last 22 months, businesses have created more than 3 million jobs. Last year, they created the most jobs since 2005. American manufacturers are hiring again, creating jobs for the first time since the late 1990s. Together, we've agreed to cut the deficit by more than $2 trillion. And we've put in place new rules to hold Wall Street accountable, so a crisis like this never happens again. (Applause.) The state of our Union is getting stronger. And we've come too far to turn back now. As long as I'm President, I will work with anyone in this chamber to build on this momentum. But I intend to fight obstruction with action, and I will oppose any effort to return to the very same policies that brought on this economic crisis in the first place. (Applause.) No, we will not go back to an economy weakened by outsourcing, bad debt, and phony financial profits. Tonight, I want to speak about how we move forward, and lay out a blueprint for an economy that's built to last -- an economy built on American manufacturing, American energy, skills for American workers, and a renewal of American values. Now, this blueprint begins with American manufacturing. On the day I took office, our auto industry was on the verge of collapse. Some even said we should let it die. With a million jobs at stake, I refused to let that happen. In exchange for help, we demanded responsibility. We got workers and automakers to settle their differences. We got the industry to retool and restructure. Today, General Motors is back on top as the world's number-one automaker. Chrysler has grown faster in the U.S. than any major car company. Ford is investing billions in U.S. plants and factories. And together, the entire industry added nearly 160,000 jobs. Tonight marks the eighth year that I've come here to report on the State of the Union. And for this final one, I'm going to try to make it a little shorter. I know some of you are antsy to get back to Iowa. I've been there. I'll be shaking hands afterwards if you want some tips. (Laughter.) And I understand that because it's an election season, expectations for what we will achieve this year are low. But, Mr. Speaker, I appreciate the constructive approach that you and the other leaders took at the end of last year to pass a budget and make tax cuts permanent for working families. So I hope we can work together this year on some bipartisan priorities like criminal justice reform -- (applause) -- and helping people who are battling prescription drug abuse and heroin abuse. (Applause.) So, who knows, we might surprise the cynics again. But tonight, I want to go easy on the traditional list of proposals for the year ahead. Don't worry, I've got plenty, from helping students learn to write computer code to personalizing medical treatments for patients. And I will keep pushing for progress on the work that I believe still needs to be done. Fixing a broken immigration system. (Applause.) Protecting our kids from gun violence. (Applause.) Equal pay for equal work. (Applause.) Paid leave. (Applause.) Raising the minimum wage. (Applause.) All these things still matter to hardworking families. They're still the right thing to do. And I won't let up until they get done. But for my final address to this chamber, I don't want to just talk about next year. I want to focus on the next five years, the next 10 years, and beyond. I want to focus on our future. We live in a time of extraordinary change -- change that's reshaping the way we live, the way we work, our planet, our place in the world. It's change that promises amazing medical breakthroughs, but also economic disruptions that strain working families. It promises education for girls in the most remote villages, but also connects terrorists plotting an ocean away. It's change that can broaden opportunity, or widen inequality. And whether we like it or not, the pace of this change will only accelerate. America has been through big changes before -- wars and depression, the influx of new immigrants, workers fighting for a fair deal, movements to expand civil rights. Each time, there have been those who told us to fear the future who claimed we could slam the brakes on change; who promised to restore past glory if we just got some group or idea that was threatening America under control. And each time, we overcame those fears. We did not, in the words of Lincoln, adhere to the dogmas of the quiet past. Instead we thought anew, and acted anew. We made change work for us, always extending America's promise outward, to the next frontier, to more people. And because we did -- because we saw opportunity where others saw only peril -- we emerged stronger and better than before. What was true then can be true now. Our unique strengths as a nation -- our optimism and work ethic, our spirit of discovery, our diversity, our commitment to rule of law -- these things give us everything we need to ensure prosperity and security for generations to come. In fact, it's that spirit that made the progress of these past seven years possible. It's how we recovered from the worst economic crisis in generations. It's how we reformed our health care system, and reinvented our energy sector; how we delivered more care and benefits to our troops and veterans, and how we secured the freedom in every state to marry the person we love. But such progress is not inevitable. It's the result of choices we make together. And we face such choices right now. Will we respond to the changes of our time with fear, turning inward as a nation, turning against each other as a people? Or will we face the future with confidence in who we are, in what we stand for, in the incredible things that we can do together? So let's talk about the future, and four big questions that I believe we as a country have to answer -- regardless of who the next President is, or who controls the next Congress. First, how do we give everyone a fair shot at opportunity and security in this new economy? (Applause.) Second, how do we make technology work for us, and not against us -- especially when it comes to solving urgent challenges like climate change? (Applause.) Third, how do we keep America safe and lead the world without becoming its policeman? (Applause.) And finally, how can we make our politics reflect what's best in us, and not what's worst? Let me start with the economy, and a basic fact: The United States of America, right now, has the strongest, most durable economy in the world. (Applause.) We're in the middle of the longest streak of private sector job creation in history. (Applause.) More than 14 million new jobs, the strongest two years of job growth since the 90s, an unemployment rate cut in half. Our auto industry just had its best year ever. (Applause.) That's just part of a manufacturing surge that's created nearly 900,000 new jobs in the past six years. And we've done all this while cutting our deficits by almost three-quarters. (Applause.) Anyone claiming that America's economy is in decline is peddling fiction. (Applause.) Now, what is true -- and the reason that a lot of Americans feel anxious -- is that the economy has been changing in profound ways, changes that started long before the Great Recession hit; changes that have not let up. Today, technology doesn't just replace jobs on the assembly line, but any job where work can be automated. Companies in a global economy can locate anywhere, and they face tougher competition. As a result, workers have less leverage for a raise. Companies have less loyalty to their communities. And more and more wealth and income is concentrated at the very top. All these trends have squeezed workers, even when they have jobs; even when the economy is growing. It's made it harder for a hardworking family to pull itself out of poverty, harder for young people to start their careers, tougher for workers to retire when they want to. And although none of these trends are unique to America, they do offend our uniquely American belief that everybody who works hard should get a fair shot. For the past seven years, our goal has been a growing economy that works also better for everybody. We've made progress. But we need to make more. And despite all the political arguments that we've had these past few years, there are actually some areas where Americans broadly agree. We gather tonight knowing that this generation of heroes has made the United States safer and more respected around the world. For the first time in nine years, there are no Americans fighting in Iraq. (Applause.) For the first time in two decades, Osama bin Laden is not a threat to this country. (Applause.) Most of al Qaeda's top lieutenants have been defeated. The Taliban's momentum has been broken, and some troops in Afghanistan have begun to come home. These achievements are a testament to the courage, selflessness and teamwork of America's Armed Forces. At a time when too many of our institutions have let us down, they exceed all expectations. They're not consumed with personal ambition. They don't obsess over their differences. They focus on the mission at hand. They work together. Imagine what we could accomplish if we followed their example. (Applause.) Think about the America within our reach: A country that leads the world in educating its people. An America that attracts a new generation of high-tech manufacturing and high-paying jobs. A future where we're in control of our own energy, and our security and prosperity aren't so tied to unstable parts of the world. An economy built to last, where hard work pays off, and responsibility is rewarded. We can do this. I know we can, because we've done it before. At the end of World War II, when another generation of heroes returned home from combat, they built the strongest economy and middle class the world has ever known. (Applause.) My grandfather, a veteran of Patton's Army, got the chance to go to college on the GI Bill. My grandmother, who worked on a bomber assembly line, was part of a workforce that turned out the best products on Earth. The two of them shared the optimism of a nation that had triumphed over a depression and fascism. They understood they were part of something larger; that they were contributing to a story of success that every American had a chance to share -- the basic American promise that if you worked hard, you could do well enough to raise a family, own a home, send your kids to college, and put a little away for retirement. The defining issue of our time is how to keep that promise alive. No challenge is more urgent. No debate is more important. We can either settle for a country where a shrinking number of people do really well while a growing number of Americans barely get by, or we can restore an economy where everyone gets a fair shot, and everyone does their fair share, and everyone plays by the same set of rules. (Applause.) What's at stake aren't Democratic values or Republican values, but American values. And we have to reclaim them. Let's remember how we got here. Long before the recession, jobs and manufacturing began leaving our shores. Technology made businesses more efficient, but also made some jobs obsolete. Folks at the top saw their incomes rise like never before, but most hardworking Americans struggled with costs that were growing, paychecks that weren't, and personal debt that kept piling up. In 2008, the house of cards collapsed. We learned that mortgages had been sold to people who couldn't afford or understand them. Banks had made huge bets and bonuses with other people's money. Regulators had looked the other way, or didn't have the authority to stop the bad behavior. It was wrong. It was irresponsible. And it plunged our economy into a crisis that put millions out of work, saddled us with more debt, and left innocent, hardworking Americans holding the bag. In the six months before I took office, we lost nearly 4 million jobs. And we lost another 4 million before our policies were in full effect. ")

text_list_full <- list() text_list <- list()

====Error 1: Error 1

generate_list_fun <- function(i) { for (x in 1:10) { t_1 <- word(text, x + i + 10, x + i + 30) t_2 <- word(text, x + i + 4, x + i + 28) t_3 <- word(text, x + i + 5, x + i + 30) t_4 <- word(text, x + i + 20, x + i + 50) t_5 <- word(text, x + i + 2, x + i + 50)

text_list[[x]] <- data.frame(doc_id = sprintf(c(paste0("doc_", x+4*x+1),
                                                paste0("doc_", x+4*x+2),
                                                paste0("doc_", x+4*x+3),
                                                paste0("doc_", x+4*x+4),
                                                paste0("doc_", x+4*x+5))),
                             text = c(t_1, t_2, t_3, t_4, t_5),
                             year = 2000 + i)

}

text_list_full[[i]] <- text_list }

text_list_full <- lapply(1:20, generate_list_fun)

Create p2vec model

model_p2v_list <- list() temp_list <- list()

paragraph2vec_list_fun <- function (i){ for (x in 1:(length(text_listfull[[i]]))) { print(showConnections(all = FALSE)) print(paste(i,"", x))

model <- paragraph2vec(text_list_full[[i]][[x]], type = "PV-DBOW",
                       dim = 200, iter = 20,
                       min_count = 3, lr = 0.05, threads = 4)
temp_list[[x]] <- model

}

model_p2v_list[[i]] <- temp_list }

model_p2v_list <- lapply(1:length(text_list_full), paragraph2vec_list_fun) # This gives the error "training data file not found"

model_p2v_list <- lapply(1:7, paragraph2vec_list_fun) # This works, but afterwards, it will start giving me an error again.

model_p2v_list_2 <- lapply(5:7, paragraph2vec_list_fun)

jwijffels commented 3 years ago

I can't reproduce this on my Windows machine. When running this code I get the output of print(showConnections(all = FALSE)) which shows no connections are still open. Can you provide a reproducible example which fails?

library(stringr)
library(doc2vec)

text <- c("Tonight marks the eighth year that I’ve come here to report on the State of the Union. And for this final one, I’m going to try to make it a little shorter. (Applause.) I know some of you are antsy to get back to Iowa. (Laughter.) I've been there. I'll be shaking hands afterwards if you want some tips. (Laughter.) And I understand that because it’s an election season, expectations for what we will achieve this year are low. But, Mr. Speaker, I appreciate the constructive approach that you and the other leaders took at the end of last year to pass a budget and make tax cuts permanent for working families. So I hope we can work together this year on some bipartisan priorities like criminal justice reform -- (applause) -- and helping people who are battling prescription drug abuse and heroin abuse. (Applause.) So, who knows, we might surprise the cynics again. But tonight, I want to go easy on the traditional list of proposals for the year ahead. Don’t worry, I’ve got plenty, from helping students learn to write computer code to personalizing medical treatments for patients. And I will keep pushing for progress on the work that I believe still needs to be done. Fixing a broken immigration system. (Applause.) Protecting our kids from gun violence. (Applause.) Equal pay for equal work. (Applause.) Paid leave. (Applause.) Raising the minimum wage. (Applause.) All these things still matter to hardworking families. They’re still the right thing to do. And I won't let up until they get done.
But for my final address to this chamber, I don’t want to just talk about next year. I want to focus on the next five years, the next 10 years, and beyond. I want to focus on our future.
We live in a time of extraordinary change -- change that’s reshaping the way we live, the way we work, our planet, our place in the world. It’s change that promises amazing medical breakthroughs, but also economic disruptions that strain working families. It promises education for girls in the most remote villages, but also connects terrorists plotting an ocean away. It’s change that can broaden opportunity, or widen inequality. And whether we like it or not, the pace of this change will only accelerate.
America has been through big changes before -- wars and depression, the influx of new immigrants, workers fighting for a fair deal, movements to expand civil rights. Each time, there have been those who told us to fear the future; who claimed we could slam the brakes on change; who promised to restore past glory if we just got some group or idea that was threatening America under control. And each time, we overcame those fears. We did not, in the words of Lincoln, adhere to the “dogmas of the quiet past.” Instead we thought anew, and acted anew. We made change work for us, always extending America’s promise outward, to the next frontier, to more people. And because we did -- because we saw opportunity where others saw only peril -- we emerged stronger and better than before.
What was true then can be true now. Our unique strengths as a nation -- our optimism and work ethic, our spirit of discovery, our diversity, our commitment to rule of law -- these things give us everything we need to ensure prosperity and security for generations to come.
In fact, it’s that spirit that made the progress of these past seven years possible. It’s how we recovered from the worst economic crisis in generations. It’s how we reformed our health care system, and reinvented our energy sector; how we delivered more care and benefits to our troops and veterans, and how we secured the freedom in every state to marry the person we love.
But such progress is not inevitable. It’s the result of choices we make together. And we face such choices right now. Will we respond to the changes of our time with fear, turning inward as a nation, turning against each other as a people? Or will we face the future with confidence in who we are, in what we stand for, in the incredible things that we can do together?
So let’s talk about the future, and four big questions that I believe we as a country have to answer -- regardless of who the next President is, or who controls the next Congress.
First, how do we give everyone a fair shot at opportunity and security in this new economy? (Applause.)
Second, how do we make technology work for us, and not against us -- especially when it comes to solving urgent challenges like climate change? (Applause.)
Third, how do we keep America safe and lead the world without becoming its policeman? (Applause.)
And finally, how can we make our politics reflect what’s best in us, and not what’s worst?
Let me start with the economy, and a basic fact: The United States of America, right now, has the strongest, most durable economy in the world. (Applause.) We’re in the middle of the longest streak of private sector job creation in history. (Applause.) More than 14 million new jobs, the strongest two years of job growth since the ‘90s, an unemployment rate cut in half. Our auto industry just had its best year ever. (Applause.) That's just part of a manufacturing surge that's created nearly 900,000 new jobs in the past six years. And we’ve done all this while cutting our deficits by almost three-quarters. (Applause.)
Anyone claiming that America’s economy is in decline is peddling fiction. (Applause.) Now, what is true -- and the reason that a lot of Americans feel anxious -- is that the economy has been changing in profound ways, changes that started long before the Great Recession hit; changes that have not let up.
Today, technology doesn’t just replace jobs on the assembly line, but any job where work can be automated. Companies in a global economy can locate anywhere, and they face tougher competition. As a result, workers have less leverage for a raise. Companies have less loyalty to their communities. And more and more wealth and income is concentrated at the very top.
All these trends have squeezed workers, even when they have jobs; even when the economy is growing. It’s made it harder for a hardworking family to pull itself out of poverty, harder for young people to start their careers, tougher for workers to retire when they want to. And although none of these trends are unique to America, they do offend our uniquely American belief that everybody who works hard should get a fair shot.
For the past seven years, our goal has been a growing economy that works also better for everybody. We’ve made progress. But we need to make more. And despite all the political arguments that we’ve had these past few years, there are actually some areas where Americans broadly agree.")

text_list_full <- list()
text_list <- list()

for (x in 1:20){
    for (i in 1:10) {
        text_list[[i]] <- word(text, x + i + 2, x + i + 12)
    }
    text_list_full[[x]] <- text_list
}

generate_list_fun <- function(i) {
    for (x in 1:10) {
        text_list[[x]] <- word(text, x + i + 2, x + i + 12)
    }

    text_list_full[[i]] <- text_list
}

text_list_full <- lapply(1:20, generate_list_fun)
model_p2v_list <- list()

paragraph2vec_list_fun <- function (i){
    for (x in 1:length(text_list_full[[i]])) {
        print(showConnections(all = FALSE))
        x <- text_list_full[[i]][[x]]
        x <- data.frame(doc_id = sprintf("doc_", seq_along(x)), 
                        text = unlist(x), stringsAsFactors = FALSE)
        model <- paragraph2vec(x, type = "PV-DBOW",
                               dim = 200, iter = 20,
                               min_count = 3, lr = 0.05, threads = 4)
    }

}

> model_p2v_list <- lapply(1:length(text_list_full), paragraph2vec_list_fun)
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
     description class mode text isopen can read can write
jusme326 commented 3 years ago

I realized that I hadn't created proper dataframes within the list to spit out the error. I just edited the error information and code. Let me know how this one goes for you!

jwijffels commented 3 years ago

your doc_id's should not contain spaces and you should at least have some training data to train upon, if there is only 1 document, there is no point in applying doc2vec

jusme326 commented 3 years ago

Got it. I have edited it so that doc_ids do not contain spaces. They also have more than 1 document. If I run the current edited file for lapply(1:7, paragraph2vec_list_fun), the function works. However, if I go above 8, it says training data not found. However, all the documents are the same.

If I run the file for lapply(1:7, paragraph2vec_list_fun) and then try to run lapply(5:7, paragraph2vec_list_fun), once again it will not run.

jwijffels commented 3 years ago

did you check with min_count = 0, maybe you have some training data without any words which occurred more than 3 times...

jwijffels commented 3 years ago

might as well be that you are running out of RAM.

jusme326 commented 3 years ago

Yes, I did. It still produces the same error. Once I get the error, if I try to open even just a random file from my computer in R, I will get the error message "Too many open files".

jwijffels commented 3 years ago

I can't run that script as all my RAM got used.

jwijffels commented 3 years ago

I think this is related to the work I was doing on branch https://github.com/bnosac/doc2vec/tree/destructtrainthreads Learning always start from a file with text and every thread opens that file at the C++ level. These train threads are destroyed only after the R object is no longer in scope in your session. But if you keep all your models in your session open, there will be many open connections to files., giving you issue. So solution here is that I destroy all the threads after training is done (that was the work I was doing in branch branch https://github.com/bnosac/doc2vec/tree/destructtrainthreads) or you try to limit the number of models you need.

jwijffels commented 3 years ago

This is now covered in commit https://github.com/bnosac/doc2vec/commit/b00e5deac70c882a2354f25f69c5c20dfb0b982f If you install the package with remotes::install_github("bnosac/doc2vec"), you can test out I hope to release this 0.2.0 version of the package soon which contains as well an implementation of the top2vec semantic clustering algorithm and allows transfer learning based on a pretrained set of word embeddings.

jusme326 commented 3 years ago

That's super helpful. Thanks Jan. I'll test it out again and close the thread afterwards.

jwijffels commented 3 years ago

Let me close this as open connections are no longer possible as when you build the model starting from a data.frame, the file which is used to construct the model is removed at the end of the model building process. Feel free to reopen if needed.