Open mzeddd opened 11 years ago
I've just downloaded the file manually, read as HTML, and written using Excel2007, and it gives me exactly what I'd expect (including the Cyrillic characters) without any problems:
$inputFileType = 'HTML';
$inputFileName = './Cyrillic_UTF-8.htm';
$outputFileType = 'Excel2007';
$outputFileName = './Cyrillic_UTF-8.xlsx';
echo date('H:i:s')." Load Tests from $inputFileType file" , PHP_EOL;
$callStartTime = microtime(true);
try {
$objPHPExcelReader = PHPExcel_IOFactory::createReader($inputFileType);
$objPHPExcelReader -> setReadDataOnly(false);
$objPHPExcel = $objPHPExcelReader->load($inputFileName);
} catch (Exception $e) {
die ('ERROR: ' . $e->getMessage());
}
$callEndTime = microtime(true);
$callTime = $callEndTime - $callStartTime;
echo 'Call time to read Workbook was '.sprintf('%.4f',$callTime)." seconds" , PHP_EOL;
// Echo memory usage
echo date('H:i:s').' Current memory usage: '.(memory_get_usage(true) / 1024 / 1024)." MB" , PHP_EOL , PHP_EOL;
echo date('H:i:s')." Save as $outputFileType file" , PHP_EOL;
$callStartTime = microtime(true);
$objPHPExcelWriter = PHPExcel_IOFactory::createWriter($objPHPExcel,$outputFileType);
$objPHPExcel = $objPHPExcelWriter->save($outputFileName);
$callEndTime = microtime(true);
$callTime = $callEndTime - $callStartTime;
echo 'Call time to write Workbook was '.sprintf('%.4f',$callTime)." seconds" , PHP_EOL;
// Echo memory usage
echo date('H:i:s').' Current memory usage: '.(memory_get_usage(true) / 1024 / 1024)." MB" , PHP_EOL , PHP_EOL;
echo date('H:i:s').' Peak memory usage: '.(memory_get_peak_usage(true) / 1024 / 1024)." MB" , PHP_EOL , PHP_EOL;
Can you try to save page using CURL as I did?
No problems using curl to retrieve the file either. Are you using the latest develop branch?
I try your method to save html table with cyrillic to xsl, but cyrillic word is broken.
$table = $this->displayTable($rows);
$tmpfile = tempnam(sys_get_temp_dir(), "html");
file_put_contents($tmpfile, $table);
try { $excelHTMLReader = PHPExcel_IOFactory::createReader("HTML"); // Set active sheet index to the first sheet, so Excel opens this as the first sheet $excelHTMLReader->setReadDataOnly(false); $objPHPExcel = $excelHTMLReader->load($tmpfile); } catch (Exception $e) { die ('ERROR: ' . $e->getMessage()); }
$objPHPExcel->getProperties()->setCreator("BPage 2.0")
->setLastModifiedBy("BPage 2.0")
->setTitle($title);
// Rename worksheet
$objPHPExcel->getActiveSheet()->setTitle($title);
unlink($tmpfile);
if ($outputType == "Excel5") {
// Redirect output to a client’s web browser (Excel5)
header("Content-Type: application/vnd.ms-excel");
header("Content-Disposition: attachment;filename=\"rep-".today(DATE_BG).".xls\"");
} else {
// Redirect output to a client’s web browser (Excel2007)
header("Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; charset=UTF-8;"); // charset=UTF-8
header("Content-Disposition: attachment;filename=\"rep-".today(DATE_BG).".xlsx\"");
}
header('Cache-Control: max-age=0');
header('Cache-Control: max-age=1');
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT'); // Date in the past
header('Last-Modified: '.gmdate('D, d M Y H:i:s').' GMT'); // always modified
header('Cache-Control: cache, must-revalidate'); // HTTP/1.1
header('Pragma: public'); // HTTP/1.0
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, $outputType);
$objWriter->save('php://output');
How resolve my problem in Render/HTML.php replase $loaded = $dom->loadHTML($this->securityScanFile($pFilename)); with $loaded = $dom->loadHTML(mb_convert_encoding($this->securityScanFile($pFilename), 'HTML-ENTITIES', 'UTF-8')); and all work correct.
Hi,
I'm trying to fetch data from HTML page and read it as an Excel table
This is what I have (code from different examples combined together)
$url = "http://paraplan.ru/forum/jonatan.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_TIMEOUT, 3); // times out after 4s
$result = curl_exec($ch); // run the whole process curl_close($ch);
$outputFile = 'var/counter.html';
$fp = fopen($outputFile, 'w'); $test = fwrite($fp, $result); fclose($fp);
$objReader = PHPExcel_IOFactory::createReader('HTML'); $objPHPExcel = $objReader->load($outputFile);
With this code PHPExcel stops loading file because it is not valid HTML file.
I think you should check validation function. Possibly it is not working due to Russian content of the page (Cyrillic).
Currently in my code I disabled it to proceed with my experiments.