Closed jacksonjacobs1 closed 7 months ago
had this issue recently, try:
df=pd.read_csv('results.tsv',skiprows=5, delimiter=\"\t\",index_col=False)
>>> df=pd.read_csv(fp,skiprows=5, delimiter="\t",index_col=False)
File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 30 fields in line 1695, saw 31
This did not solve the issue. I don't think the index column is the issue here. Rather, when extra \t characters are inserted, values get pushed into an extra nameless column. When I remove the extra column, cohortFinder works fine.
Is there a reason why the warning message contains a \t character?
hmmm....i don't think it "contains" a tab
have you tried opening a results.tsv file in excel, when importing select tab delimited
when i do that , everything works as expected and lines up nicely - warnings column is empty
as well, when i look at it in notepad++, the column is empty also
here we see the final 2 columsn, pixel to use is 33.... and then there is a tab and then a new line, so warning is empty
i think its a very pandas specific thing that is causing this craziness
Below I've copied and pasted a line from the file (opened in excel). Note that the second-to-last and last cells should be one cell, but the \t character in https://github.com/choosehappy/HistoQC/blob/d29c63c8de01490816bdeeb3ffbaa0920ce6c875/histoqc/SaveModule.py#L44
... causes the message to be split into two cells in the tsv. This problem only occurs when HistoQC tries to add the above warning message. <!DOCTYPE html>
img1.svs | | (0, 0, 63743, 20112) | 20 | aperio | 3 | 20112 | 63743 | 0.5011 | 0.5011 | Aperio Image Library v12.0.15 65024x20212 [0,100 63743x20112] (240x240) JPEG/RGB Q=70\|AppMag = 20\|StripeWidth = 2032\|ScanScope ID = 00000000\|Filename = 000000\|Date = 00000000\|Time = 00000000\|Time Zone = 000000000\|User = 000000000000000000000000000000000000\|Parmset = Special Slide Settings\|MPP = 0.5011\|Left = 11.705497\|Top = 18.827839\|LineCameraSkew = 0.001923\|LineAreaXOffset = 0.032431\|LineAreaYOffset = -0.006467\|Focus Offset = 0.000000\|DSR ID = 0000000000\|ImageID = 000000\|Exposure Time = 32\|Exposure Scale = 0.000001\|DisplayColor = 0\|SessonMode = 00\|OriginalWidth = 65024\|OriginalHeight = 20212\|ICC Profile = AT2 | 0.980215412165767 | 0.000767064665569861 | 0.000444430976839105 | 1563 | 3.80102367242482 | 198 | -0.0600349639749795 | 20457 | 3.51977318277362 | 145 | 0.686406101048618 | 0 | 0 | 0 | 0 | 0.606547908560311 | 1 | 0 | \|After BasicModule.finalProcessingArea NO tissue remains detectable! Downstream modules likely to be incorrect/fail\|581421.svs- | saveMacro Can't Read 'macro' Image from Slide's Associated Images -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --yupppppppp that'd do it. i don't think we need that "\t" in the warning message, right? if we remove it and replace it with a space, would that solve the problem?
Agreed. I'll push the commit directly into the main branch.
https://github.com/choosehappy/HistoQC/blob/d29c63c8de01490816bdeeb3ffbaa0920ce6c875/histoqc/SaveModule.py#L44
"\t" produces a nameless column in the tsv file. CohortFinder cannot read files which have nameless columns: